


SOCIAL STATISTICS 



HARPER’S SOCIAL SCIENCE SERIES 

F. STUART CHAPIN, EDITOR 


HUMAN RELATIONS 

by Carl C. Taylor 
and B. F. Brown 

RURAL SOCIOLOGY 
(Revised Edition) 
by Carl C. Taylor 

AN INTRODUCTION TO ANTHROPOLOGY 

by Wilson I). Wallis 

SOCIOLOGY AND EDUCATION 

by Alvin Good 

SOCIAL MOBILITY 

by Pitirim Sorokin 

PROBLEMS OF SOCIAL WELL-BEING 

by J . H. S. Bossard 

CONTEMPORARY SOCIOLOGICAL THEORIES 

by Pitirim Sorokin 

SOCIAL WORK ADMINISTRATION 

by El wood Street 

THE SOCIAL WORKER 

IX FAMILY, MEDICAL AXD PSYCHIATRIC SOCIAL WORK 

by Louise C\ Odencrantz 

THE SOCIAL WORKER IN GROUP WORK 

by Margarctta Williamson 

TRENDS IN AMERICAN SOCIOLOGY 

by George A. Lundbcrg and others 

THE SOCIAL WORKER IX CHILD CARE AXD PROTECTION 

by Margaretta Williamson 

AMERICAN MINORITY PEOPLES 

by Donald Young 

SOCIAL PSYCHOLOGY 

by Joseph K. Folsom 

PRINCIPLES OF SOCIOLOGY 

by E. T. Hiller 

SOCIAL STATISTICS 

by K. Clyde White 



SOCIAL 


STATISTICS 


By 

R. CLYDE WHITE 

Professor of Sociology and Director 
of the Bureau of Social Research 
Indiana University 



HARPER 6? BROTHERS PUBLISHERS 

New York and London ' 

19 3 3 



SOCIAL STATISTICS 

Copyright, 1933 > h Harper & Brothers 
Printed in the U. S. A. 

First Edition 


All rights in this book are reserved . 

No part of the text may ^ be reproduced in any manner 
whatsoever without permission in writing from 
Harper & Brothers 



Editor s Introduction 


Statistical method has become a fundamental tool to scientific 
advances in sociology and in social work. This book is unique in 
that it combines between two covers divisions of statistical method 
which have hitherto been featured separately only in books on 
economic statistics, or in books on psychological statistics, or in 
books on vital statistics. 

Professor White has woven into a single consistent treatment, 
not only the usual techniques of tabulation, graphic representation, 
the measurement of central tendencies, dispersion and correlation, 
but he has added a simple presentation of the technique of analysis 
of time series, an outline of the chief elements of vital statistics, 
and a suggestive treatment of the technique of social measure- 
ments and the standardization of sociometric scales. 

F. Stuart Chapin 




Preface 


A great deal of work has been done during the last decade in the 
field of what may properly be called “social statistics” as dis- 
tinguished from economic and business statistics. 

Until recent years social statistics connoted only columns of 
figures. It still refers to the tabulation of social data, but it is an 
improved and extended tabulation plus a technique for extracting 
the meaning of such data, namely, the methods of statistical analy- 
sis applied to social data for scientific and practical purposes. 

The forerunners of the present-day social statisticians were such 
men as Quetelet, Pareto, Galton, Mayo-Smith, Giddings, and 
Wright. Giddings belongs in this list, not because he did a great 
amount of statistical work himself, but because of the influence he 
exercised in directing his students into quantitative studies of social 
data. Among the leaders in social statistics today may be mentioned 
Chaddock, Chapin, Dublin, Shelby Harrison, Hexter, Hurlin, 
Ogburn, Rice, Frank A. Ross, Dorothy S. Thomas, Truesdell, 
and many others. These men and their predecessors have cre- 
ated a special division of the field of statistics, and the result is 
that a large number of colleges and universities in the United 
States are now giving courses in this branch of statistics. 

This book represents an effort to adapt statistical methods to 
the data of sociology and social work for teaching purposes in the 
light of the work done by American social statisticians. The meth- 
ods and principles discussed and illustrated are well known. What- 
ever innovation there may be lies in the fact that ordinary meth- 
ods of statistics have been applied systematically to the data of 
sociology and social work. The author believes that a social statis- 
tician learns his technique and acquires the correct habits of thought 
about his work by applying statistical methods to his own data. 
He is not likely to have much of a penchant for social statistics, 
if he learns statistical methods through the use of biological data. 
Familiarity with the data of sociology and social work and practice 
in analyzing these data by statistical methods are fundamental to 
the training of a social statistician. A course in mathematical sta- 
tistics is an excellent thing for a student, but from the viewpoint 

vii 



PREFACE 


viii 

of the sociologist and the social worker it is pedagogically inade- 
quate. Prolonged practice in the application of statistical methods 
to social data is essential to develop the thought habits necessary 
in a special field. This book uses illustrative material which cen- 
ters the attention of the student on a problem of sociology or of 
social work. Statistical methods are, then, tools with which he 
may work and are means which may be employed to answer ques- 
tions about sociology and social work. The introductory course in 
statistics contemplated in the preparation of this book is one which 
introduces the student to quantitative aspects of sociology and 
social work. The author believes that this viewpoint has a distinct 
advantage in the training of a social statistician over the viewpoint 
that one kind of illustrative material is as good as another. 

This text is intended to provide for a two-hour course through- 
out the year. The materials for exercises given at the end of each 
chapter (beginning with Chapter V) are to be used as a basis of 
practice in using the particular methods under consideration. Teach- 
ers of social statistics will usually want to introduce some exercise 
material which they have found particularly good or which directs 
the attention of the student to social data in his own city or state. 
Thus, the book does not prevent considerable latitude in the choice 
of materials for laboratory practice, while at the same time it places 
at the disposal of the instructor materials which he may use at his 
own discretion. If the course happens to be a three-hour course 
throughout the year, the author has found that the materials of 
this book can be used satisfactorily by making use of special studies 
by students. It has been the practice of the author to use as much 
of the year as is necessary to cover the methods discussed in the 
book and then to plan special problems for statistical analysis by 
the students. These problems may involve the use of half a dozen 
of the common methods of statistics. One of the problems the 
entire class worked upon was the construction of special and gen- 
eral indexes of public welfare work in Indiana. The construction 
of these indexes involved definition of terms, the consideration and 
estimation of population changes, computation of rates and aver- 
ages, determination of weights^ and the computation of trends and 
cyclical variations. If time permits, several such problems may be 
studied during the year. The student is required to decide what 
methods are necessary to answer the questions raised about the 
problem, and then he is expected to interpret the results of his 



PREFACE 


IX 


work. This experience helps to develop the habits of thought and 
imagination necessary to a social statistician. 

Four chapters have been included in this volume which are not 
usually found in general texts on statistics. They are: Chapter II, 
“Sources of Published Statistics 55 ; Chapter IV, “Working Out a 
Statistical Problem 55 ; Chapter XIV, “Vital Statistics 55 and Chapter 
XV, “Rating Scales . 55 Unless the instructor gives a lecture on 
standard sources of statistical material, the student is likely to have 
an uncertain idea about where to turn when he wants certain kinds 
of data. For this reason Chapter II was included as a sort of “bib- 
liography 55 for the student in statistics. Most texts on statistics dis- 
cuss the procedure required in working out a statistical problem, 
but the discussion is generally scattered through the book. There 
is no objection to this, but it seems to the present author that it is 
desirable to present this subject in a separate chapter so that the 
procedure may be shown more systematically. A number of mono- 
graphs have been written on vital statistics, but general texts usu- 
ally have given only scant attention to the subject. Yet the social 
statistician is constantly concerned with births, deaths, morbidity, 
and population. It seems reasonable, therefore, in a book on social 
statistics to give a separate chapter to the presentation of a few of 
the methods of analysis used in the study of vital statistics. Rating 
scales are fairly new as statistical tools, outside of the scales for 
the measurement of intelligence, but they give promise of much 
greater importance to the social statistician in the future. It was 
felt that the student should become familiar with the nature and 
possibilities of rating scales in sociology and social work. Hence, 
Chapter XV was devoted to this subject. 

The author of a book on social statistics is inevitably indebted to 
a great number of his colleagues, known and unknown. Most of 
all my thanks are due to Professor F. Stuart Chapin, Editor of 
Harper’s Social Science Series. Professor Chapin has read all of the 
manuscript, and in conference and by letter has offered valuable 
criticisms and constructive suggestions far too numerous to men- 
tion in detail. He is entitled to much of the credit for whatever 
merit the book possesses. I wish to express my appreciation to 
Professor Robert E. Chaddock, my former teacher in statistics, 
who stimulated my interest in social statistics and whose clear 
thinking in his writings has been a constant inspiration to me dur- 
ing the years since I sat in his classes. My thanks are due to Pro- 
fessor Charles R. Metzger, of Indiana University, and to Miss 



X 


PREFACE 


R. Elizabeth Cox, my secretary, for checking the computations in 
the book and for assisting with the proofreading. I also want to 
express appreciation to Professor U. G. Weatherly of Indiana 
University, for his interest and encouragement during the time 
the manuscript has been in preparation ; he has helped to clarify 
my conception of the function of statistical methods in sociology 
by kindly and philosophic criticism. 

Acknowledgment is here made to the Johns Hopkins Press for 
permission to summarize extensively from Schmeckebier’s The 
Statistical Work of the National Government \ to the University 
of Chicago Press for permission to quote at length from Thurstone 
and Chave’s The Measurement of Attitudes \ to Houghton Mifflin 
Company for permission to reprint the tables of logarithms in 
Kuhn and Morris’ Mathematics of Finance j to George Routledge 
and Sons for permission to use considerable material from Dorothy 

S. Thomas’ Social Aspects of the Business Cycle . For aid in the 
assembling of statistical material for illustrative purposes and for 
the use of official reports, I wish to express my gratitude to the 
Indiana Board of State Charities and the Indianapolis Family 
Welfare Society. 

The author wishes to emphasize the fact that he assumes full 
responsibility for the shortcomings of this volume. Those who 
have given advice or have assisted in other ways are in no way re- 
sponsible for its weaknesses. 

Indianapolis, Jan. 2 , 1933 


R. Clyde White 



TABLE OF CONTENTS 


Introduction v 

Preface vii 

PART I: INTRODUCTION 

Chapter Page 

I. Social Problems and Social Statistics 3 

II. Sources of Published Statistics 29 

III. The Nature of Statistical Research 60 

IV. Working Out a Statistical Problem 81 

PART II: STATISTICAL ANALYSIS 

V. Collection and Assembling of Data 99 

VI. Tabulation of Statistical Data 1 19 

VII. Graphic Presentation 136 

VIII. Measures of Central Tendency 199 

IX. Measures of Dispersion 230 

X. Index Numbers 254 

XI. Measurement of Relationships 277 

XII. The Theory of Probability 317 

XIII. Time Series 343 

XIV. Vital Statistics 384 

XV. Rating Scales 405 

Appendix 425 

Index 469 




LIST OF FIGURES 


Figure Page 

I. Cases Disposed of by Marion County Criminal Court 

for the City of Indianapolis 83 

II. Hollerith Machine Card 85 

III. The Electric Key Punch 86 

IV. The Electric Horizontal Sorting Machine 86 

V. Relation of Business Cycles to Marriage and Di- 
vorce Rates 95 

VI. Hollerith Card for Mortality Study 102 

VII. Report Form Used by the Boards of Children’s 

Guardians, Indiana 104 

VIII. Registration Form 105 

IX. Statistical Card 106 

IX-A. Reverse Side of Figure IX 107 

X. Questionnaire of the U. S. Bureau of Labor Sta- 
tistics 108 

XI. Questionnaire of the U. S. Bureau of Labor Sta- 
tistics 109 

XII. Schedule for the Study of Compensation for Auto- 
mobile Accidents 112-114 

XIII. Schedule Used in a Child Welfare Study 115 

XIV. Work Sheet for Assembling Crime Data Sorted on a 

Tabulating Machine 117 

XV. Work Sheet for Assembling Crime Data- — Hand and 

Tally Method 117 

XVI. Jail Prisoners per 100,000 Population in Indiana 

Counties 127 

XVII. New Protestant Denominations in Each 50-Year 
Period, 1500 to 1900, as Represented in the 
United States 137 

XVIII. Location of the Southwest Cgkner of the House 

at P 140 

XIX. Rectangular Coordinates 141 

XX. Showing the Cumulative Percentage of Time 
• Served on a io-Year Sentence in a Federal Prison 

(1) without Deductions for Good Behavior and 

xiii 



xiv 

Figure 


LIST OF FIGURES 


Page 


(2) with Regular Monthly Deductions for Good 
Behavior 146 

XXI. The Accumulation of $1,000 at 6 Per Cent Interest 

at the End of Each Year of a io-Year Period 149 
XXII. Population of the United States, 1790-1930 (Natu- 
ral Scale) i 5 1 

XXIII. Population of the United States, 1790-1930 (Semi- 

Logarithmic, or Ratio, Scale) 152 

XXIV. Population of the United States, 1790-1930 (Loga- 
rithms of Population Plotted on the Vertical 
Scale) 153 

XXV. Weighted Index of Public Welfare Work in In- 
diana, 1900-1927 (Semi-Logarithmic Scale) 155 

XXVI. Comparison of Budgetary Estimate and Actual Ex- 
penditures in 1928 Through August, Indianapolis 
Family Welfare Society, in Terms of Cumulative 
Percentages 157 

XXVII. Cumulative Curves Showing the Age Distribution 
of Felons in Indianapolis in 1930 on a “More Than” 
and on a “Less Than” Basis — 651 Felons 159 

XXVIII. Age Distribution of 5,319 Workers 161 

XXIX. Age Distribution of Workers, 5-Year Class-Intervals 163 

XXX. Comparison of the Age Distribution of Employees in 
Six Firms and of the Total Male Population of 
Indianapolis between 15 and 64 Years of Age in 
Terms of Percentage 165 

XXXI. Distribution of Children in the Eighth Grade, 

St. Louis Public Schools, by Ages 167 

XXXII. Distribution of Children in the Eighth Grade, 

St. Louis Public Schools, by Ages, Showing the 
Relations between a Histogram and a Frequency 
Polygon i 69 

XXXIII. Distribution of Children in the Eighth Grade,. 

St. Louis Public Schools, by Ages, Comparing the 
Frequency Polygon and the Smoothed Frequency 
Curve i 70 

XXXIV. Distribution of Children in the Eighth Grade, Com- 
paring the Frequency Polygon with the Ideal 
Frequency Curve 172 



LIST OF FIGURES 


xv 


Figure Page 

XXXV. Per Cent of Males Attending School Among the 
Native White, Foreign-Born White, Negro and 
“All Other” Population 5 to 20 Years of Age, by 
Specified Age: 1920 173 

XXXVI. 336 Cities in the United States with 25,000 or More 
Population, Which Increased Less Than 120 Per 
Cent between 1920 and 1930 175 

XXXVII. Representing the Percentage of Change in Popula- 
tion of Indianapolis from 105,436 in 1890 to 
314,194 in 1920 176 

XXXVIII. Representing Percentage Change in Population of 
Indianapolis from 105,436 in 1890 to 314,194 in 
1920 by Means of Areas 177 

XXXIX. Representing Percentage Change in Population of 
Indianapolis from 105,436 in 1890 to 314,194 in 
1920 by Means of Cubes 177 

XL. Percentage of the Population of the United States 

Represented by Each Race, 1920 178 

XLI. Percentage of White and Negro Races Among the 
Commitments to Prisons and Reformatories, 1910 
and 1923 179 

XLII. Age Distribution of the Population and of the 

Gainfully Employed over 10 Years of Age 180 

XLIII. New Commitments to Indiana Hospitals for the In- 
sane by Age Groups, Year Ending September 30, 

1929 181 

XLIV. Location of Felonies, January to June, 1929 182 

XLV. Distribution of Homes of Children Using a Public 
Playground Shown by One Dot for Each Home 
and by Concentric Circles of a Quarter-Mile 
and a Half-Mile Radius 184 

XLVI. Percentage of the White Population of Counties of 

Virginia Who Belong to Churches 185 

XLVII. Administration Agencies and Their Functions, New 

York City 187 

XLVIII. Distribution of Intelligence Among 451 Children 

in Dependent Families 203 

XLIX. Location of the Median by Means of Cumulative 

Frequency Curves 213 

L. Distribution of the Children in the Eighth Grade, 



XVI 


LIST OF FIGURES 


Figure Page 

St. Louis Public Schools, by Ages: Graphic Loca- 
tion of the Mean, Median, and Mode 226 

LI. Percentile Distribution of Infant Mortality Rates 

in 108 Cities in the United States, 1929 237 

LII. Area of Surface Enclosed by Plus and Minus Once 
the Quartile Deviation from the Median Age of 
Boston Workers 247 

LIII. Areas of Surface Enclosed by Plus and Minus Once 
the Average Deviation and by Plus and Minus 
Once the Standard Deviation from the Mean Age 
of Boston Workers 248 

LIV. Distance Traveled by a Body in Specified Time 281 

LV. Misdemeanant and Felon Rates 284 

LVI. Straight Line Fitted to Misdemeanant and Felon 

Rates 288 

LVI I. Types of Standard Curves with the Formula for 

Each 289 

LVIII. Felony Data with Fitted Curve and Limits of Error 

of Estimate 291 

LIX. Regression of Y on X, Where Y= —.155 4 " .476X 301 

LX. Scattergram with Line of Means and Freehand 

Curve Superimposed — Crime Data 303 

LXI. Number of Successes (X) and Actual and Theoreti- 
cal Frequencies (Y) in 4096 Throws of 12 Dice 322 
LXI I. The Normal Curve of Error 324 

LXIII. Normal Curve Determined from Ordinates Expressed 
as Fractional Parts of the Maximum Ordinate, 
Compared with Actual Data 330 

LXIV. Normal Curve Determined from Ratio of Y TO Y 0 , 

Compared with Actual Data 332 

LXV. Trend of Divorce Rates in Indiana, 1899-1928 348 

LXVI. Divorce Rates and Moving Averages for Four (Cen- 
tered), Five, and Seven Years 352 

LXVII. Cycles Expressed as Deviations from Trend — Mor- 
tality Indexes 373 

LXVIII. Cyclical Variations in Units of <t 376 

LXIX. Actual Population of the United States, 1870-1920, 

and Projection of the Curve to 1930 386 

LXX. Growth of the Population of the United States 389 
LXXI. Cumulative Curve of Indianapolis Population, 1930, 
and Estimation of Population 26 to 28 Years of 
Age 


391 



LIST OF TABLES 


Table 


Page 

I. Nine Kinds of Crime against Property, Showing the 
Number of Each, the Average Distance between 
the Home of the Offender and the Place of the 
Offense, and the Number of Cases in Which the 
Offense Was Committed in the Same Census 
Tract as the Residence 87 

II. Age Distribution of 651 Felons Appearing before 
the Marion County, Indiana, Criminal Court in 
1930 88 

III. Indexes of Employment and Pay-Roll Totals in 

Manufacturing Industries Concerned with 
Leather and Its Products, Yearly Averages, 

1923 to 1929 121 

IV. Poor Asylum Inmates Classified by Age and Sex, 

August 31, 1929. Indiana 122 

V. Jail Prisoners per 100,000 Population in Indiana by 

Counties, October i, 1928, to September 30, 1929 125 

VI. Jail Prisoners per 100,000 Population in Each 
County of Indiana, October i, 1928, to Septem- 


ber 30, 1929, Arrayed According to Rate 126 

VII. Frequency Distribution of Jail Imprisonment 

Rates According to Counties 128 

VIII. Five Hundred Marks in English Classified by Sin- 
gle Per Cents 130 

IX. Five Hundred Marks in English Classified in Inter- 
vals of 5 Per Cent 131 

X. The Number of New Protestant Denominations in 
Each 50-Year Period, 1500 to 1900, as Repre- 
sented in the United States 136 

XI. The Annual Accumulation of the Percentage of a 
io-Year Sentence Served Because of Good Con- 
duct in a Federal Prison 145 

XII. The Accumulation of $1,000 at 6 Per Cent Simple 
Interest at the End of Each Year of a io-Year 
Period 148 


XVII 



xviii 

Table 


LIST OF TABLES 


Page 


XIII. Population of the United States at Each Census, 

1790 to 1930 1 50 

XIV. Weighted Indexes of Public Welfare Work in In- 

diana, 1900 to 1927 154 

XV. Cumulative Percentages of Actual Expenditures 
by Months for 1928 and Cumulated Percentages 
of Budget Estimates for the Entire Year 156 

XVI. Felons Sentenced in the Marion County Criminal 
Court, 1930, According to the Percentage above 
(More Than) or below (Less Than) a Specified 
Age, 651 Felons 158 

XVII. Age Distribution of Male Employees in 6 Indianapo- 
lis Firms 160 

XVIII. Age Distribution of Male Employees in 6 Indianapo- 
lis Firms and of the Total Male Population of 
Indianapolis for the Same Age Periods (Census of 
1920) 162 

XIX. Distribution of Children in the Eighth Grade, 

St. Louis Public Schools, by Ages 166 

XX. 336 Cities in the United States with 25.000 or More 
Population, Which Inc reased Less Than 120 Per 
Cent between 1920 and 1930 174 

XXL Percentage of the Population of the L'nited 

States Represented by Each Rac e, 1920 178 

XXII. Percentage of White and Negro Races Among the 
Commitments to Prisons and Reformatories, 1910 
and 1923 179 

XXIII. Age Distribution of the Population 10 Years of 
Age and of the Gainfully F.mployed of Similar 
Ages Expressed in Percentage 180 

XXI V. New Commitments to Indiana Hospitals for the 
Insane by Age Groups, Year Ending September 
30, 1929 181 

XXV. Distribution of Homes of Children Using a Public 

Playground 183 

XXVI. Weighted Aggregates of Public Welfare Work and 
the Annual Trend Values of the Volume of 
Work, Indiana Board of Static Charities, 1900 to 
1927 194 



LIST OF TABLES 


xix 


Table Pa 9 e 

XXVII. Population per Square Mile in Continental United 

States, Excluding Alaska, 1790 to 1930 195 

XXVIII. Patients per 100,000 Population in the Indiana 
Hospitals for the Insane on the Last Day of the 
Fiscal Year, 1900 to 1927 196 

XXIX. Cumulative Percentages of the Budget ($72,000) 
Expended by a Charitable Agency, Fiscal Year 
1929-1930, Compared with the Estimated Aver- 
age Monthly Requirements 196 

XXX. Cumulative Percentages of Males in the Popula- 
tion of Indianapolis and of Males Employed by 
Six Indianapolis Firms by Age Groups 197 

XXXI. Percentage of Urban and Rural Population in the 

United States, 1890 to 1930 197 

XXXII. Percentage of Total Persons Receiving Poor Re- 
lief in Poor Asylums and from Township Trustees 
(Outdoor Relief) in Indiana in Specified Years 197 
XXXIII. Inmates in State Penal and Correctional Institu- 
tions per 100,000 Population, September 30, 1929 198 

XXXIV. Expenditures of the State Government of New 

York by Groups, Percentage Going to Each, 1920 198 

XXXV. An Array of the Ages of ioo Felons Selected at 
Random from Cases Disposed of by the Marion 
County, Indiana, Criminal Court in 1930 204 

XXXVI. Location of the Mode by Successive Regrouping of 

Ages of Felons 205 

XXXVII. Unemployed Male Workers in Boston by Age 

. Groups, April, 1930 209 

XXXVIII. Cumulative Frequencies, Unemployed Male Work- 
ers in Boston 212 

XXXIX. Computation of the Mean by the Long Method 
for Grouped Data: Unemployed Workers in 
Boston, Total 21,262 215 

XL. Computation of the Mean by the Short Method 218 
XLI. Computation of the Weighted Mean Index Number 
for the Number of Clients under the Care of 
Public Welfare Agencies in Indiana, September 
30, 1930. Base, 1913 220 

XLII. The Geometric Mean of Unemployed Workers in 

Boston Computed with the Use of Logarithms 223 



XX 


LIST OF TABLES 


T able Page 

XLII 1 . Distribution by Ages of Parolees, Classified by 

Total and by Success 227 

XLIV. Earnings of Chief Wage Earners in Families 229 

XLV. Percentile Distribution of Infant Mortality 

Rates in 108 Cities in the United States, 1929 235 

XLVI. Computation of the Average Deviation from Un- 
grouped Data: Amounts of Relief per Relief 
Case in 20 Family Relief Agencies in July, 1931 238 

XLVII. Computation of the Average Deviation from the 
Mean and from the Median for the Ages of Un- 
employed Workers in Boston 240 

XLVI II. Computation of the Average Deviation for the 

Same Data by Short Method 241 

XLIX. Computation of the Standard Deviation of the 
Ages of Unemployed Workers in Boston by the 
Long Method 243 

L. Computation of the Standard Deviation of the 
Ages of Unemployed Workers in Boston by the 
Short Method 245 

LI. The Relative Values of Three Measures of Dis- 
persion 246 

LI I. Unemployed Male Workers in Chicago at the Time 
of the Census in April, 1930, According to Age. 

Class A 251 

LIII. Ratio of Males per ioo Females Admitted to Hos- 
pitals for the Insane by States in 1927 252 

LIV. Amount of Relief per Allowance Case in Three 

New York Family Relief Agencies 262 

LV. Amount of Relief per Allowance Case in Three 
New York Family Relief Agencies and the Rela- 
tives Based upon 1927 263 

LVI. Average Monthly Allowance Case Load of Agen- 
cies, and the Weights Expressed as Percentages 
of the Total Case Loads 264 

LVII. Computation of Index Numbers by the Method of 
Weighted Aigcregates from the Allowance Case 
Data 265 

LVIII. Computation of Index Numbers by the Method of 
Average of Relatives Weighted from the Allow- 
ance Case Data 267 



LIST OF TABLES 


xxi 


Table Page 

LIX. Computation of Index Numbers by the Method of 
the Geometric Average of Relatives from the 
Allowance Case Data 269 

LX. Comparison of Indexes for Allowance Cases Com- 
puted by Different Methods 272 

LXI. Cost of Maintenance of State Institutions in In- 
diana, 1900-1930, in Actual Dollars 274 

LXII. Number of Mental Patients in State Hospitals in 

tiie United States in Specified Years 275 

LXIII. Persons under Care and Cost of Maintenance of 
the Principal Public Welfare Agencies and In- 
stitutions in Indiana, 1920-1929 276 

LXIV. Distance of a Body from the Starting Point, if It 
Moves at the Rate of 5 Feet per Second, at Speci- 
fied Seconds 280 

LXV. Misdemeanant and Felon Rates by Census Tracts, 

Indianapolis 283 

LXVI. Computation of Values (Misdemeanant and Felon 
Rates) for Determining the Line of Least 
Squares 285 

LXVII. Values of Y Estimated from Values of X and the 
Difference between the Actual and the Esti- 
mated Values 286 

LXVI II. Per Cent of Land Used for Business Purposes and 

Felon Rate with Computations 290 

LXIX. Actual Values of Y, Estimated Values of Y, and 

the Residuals 293 

LXX. Computation of Values for Fitting a Simple Parab- 
ola — Crime Data 294 

LXXI. Computation of Values for Determining the Co- 
efficient of Correlation 298 

LXXII. Computation of Group Averages to Indicate the 

Form of the Regression Curve — Crime Data 302 

LXXIII. Computation of Quantities for the Residuals and 
the Standard Deviation for Curvilinear Corre- 
lation — Crime Data 304 

LXXIV. Correlation of the Sex Ratio and the Marriage of 

Women 307 

LXXV. Divorced Persons per 1,000 Females 15 Years of 

Age in Certain Census Tracts of Indianapolis 31 i 



XXI 1 


LIST OF TABLES 


Table Page 

LXXVI. Amount of Relief per Relief Case and Amount of 
Relief per Allowance Case in 20 Relief Agen- 
cies, September, 1931 312 

LXXVI I. Police per 1,000 Population and Crimes per 1,000 

Population in 30 Cities, October, 1931 312 

LXXVIII. Index of Educational Interest and Index of Illit- 
eracy, 36 Texas Counties, 1920 313 

LXXIX. The Number of Males per ioo Females and the 

Per Cent of Women Married in i 70 Cities 3 1 5 

LXXX. Comparison of Actual and Theoretical Success 

Frequencies in 4,096 Throws of 12 Dice 321 

LXXXI. Computation of Values Required for the Deter- 
mination of Moments — Intelligence Test Data 325 
LXXXII. I.Q’s of 1,671 Children, Ages 6 to 1 2 328 

LXXXI II. Fractions of Sigma, Ratio of y to y 0 , and Theoreti- 
cal Frequencies for the Normal Curve 328 

LXXXI V. Computation of Theoretical Frequencies for 1,671 

I.Q’s 331 

LXXXV. Differences between Actual and Theoretical 

Frequencies 334 

LXXXVI. Computation of Chi-Square 335 

LXXX VII. Hourly Production and Frequency of Production 

in Each Interval — Button Workers 341 

LXXXVI II. Divorces per 100,000 Population in Indiana, 1899 

to 1928 347 

LXXXIX. Moving Averages of Divorce Rates 349 

XC. Fitting a Straight Line to the Divorce Data 353 

XCI. Computation of Parabolic Curve 355 

XCII. Computation of Logarithmic Curve 357 

XCIII. Comparison of Trend Values Derived by a 7-Year 
Moving Average, a Straight Line, a Second De- 
gree Parabola, and a Logarithmic Curve 358 

XCIV. Mortality Rates in Indiana,- 1911-1930, Expressed 
as Percentages of the Mean Monthly Rate in 
1 91 1 ' 361 

XCV. Multiple Frequency Table of Mortality Rates 

Showing Seasonal Variations 362 

XCVI. Computation of Seasonal Indexes for the Mor- 
tality by Method (i) 363 



1 1 ST OF TABLES 


xxiii 


Tabic pa 0C 

XCVII. Monthly Averages of Mortality Indexes Corrected 

for Secular Trend 364 

XCVIII. The Middle Four Mortality Rates for Each 

Month of the Year and Their Mean 366 

XCIX. Mean-Median Rates Corrected for Trend, Ad- 
justed Seasonal Indexes, and Variations from 
Monthly Average of ioo 367 

C. Seasonal Indexes Computed by the Ratio-to-Ordi- 

nate Method 369 

Cl. Three Seasonal Indexes Compared — Corrected for 

Secular Trend 369 

CII. Computation of Cyclical Variations for Annual 
Mortality Indexes Centered in the Middle of 
the Year 371 

CIII. Computation of Cyclical Variations of the 

Monthly Index by Months 374 

CIV. Transformation of Cyclical Variations in Units 

of the Variable to Units of Standard Deviation 375 
CV. Correlation of Phthisis Death Rates and the Busi- 
ness Cycle, 1875 to 1894, for England and Wales 378 
CVI. Correlation of Phthisis Death Rates and the Busi- 
ness Cycle, 1875 to 1894, for England and Wales 
— Phthisis Death Rates Lagged Two Years 380 

CVII. Active Cases of the Indianapolis Family Welfare 

Society by Years 382 

CVIII. Population of the United States at Each Census, 

1790 to 1930 382 

CIX. Active Case Load, Indianapolis Family Welfare 

Society, 1924 to 1931, by Months 382 

CX. Census of Indianapolis by Age Groups 390 

CXI. Birth Rates, Excluding Stillbirths, in the Regis- 
tration Area of the United States 393 

CXII. General Death Rates for the Registration Area 

of the United States, 1919 to 1928 395 

CXIII. Standard Million of Actual Living Persons (Both 

Sexes) in the United States, 1910 396 

CXIV. Specific Death Rates in Indianapolis, September i, 

1930, to August 31, 1931 397 

CXV. Expected Deaths in Indianapolis, September i, 

1930, to August 31, 1931 


398 



xxiv LIST OF TABLES 

T able Page 

CXVI. Population of New York City, 1900 to 1930 401 

CXVII. Persons Out of a Job, Able to WokK, and Looking 

for a Job, Class A, Illinois, April, 1930 401 

CXVIII. Births in Indiana, 1928 to 1930, by Months. Popu- 
lation of Indiana: 1928, 3,176,0005 1929, 3,207,- 
689; 1930, 3,238,000 402 

CXIX. Deaths from All Causes in the United States, 1914 
to 1928, and the Estimated Population of the 
Registration Area 403 

CXX. Deaths in the United States in Five-Year Inter- 
vals, 1928, and the Estimated Population in Each 
Interval for the Registration Area 403 

CXXI. Ordinates of Normal Probability Curve 427 

CXXII. Fractional Parts of Total Area under Normal 

Probability Curve 428 

CXXIII. Tables of the Chi-Function for the Pearson Chi 

Test 429 

CXXIV. Table of Squares, Square Roots, and Reciprocals, 

I to 1,000 436 

CXXV. Common Logarithms and Proportional Parts 446 



Part One 


INTRODUCTION 




CHAPTER I 


Social Problems 
and Social Statistics 


I. STATISTICS AS DATA AND AS METHOD 

Social statistics are data which occur in human society, and social 
statistics is a scientific method. The worker in the social sciences is 
concerned with both the data and the method. Great masses of 
social data are received, tabulated, and filed by public and private 
agencies every year, but up to the present time relatively little 
systematic use has been made of these collections either for scien- 
tific or for administrative purposes. Social statistics as a method of 
analysis leading to understanding and control is in about the same 
stage of development as accounting in business was fifty years ago, 
when the old-fashioned bookkeeper recorded receipts and disburse- 
ments, made a balance sheet and called the matter closed. Today 
accounting requires the recording of facts which its bookkeeping 
predecessor would have excluded as irrelevant, because accounting 
is now concerned with unit costs, rates of production, sales per 
employee, capital depreciation, gross profits, net profits, etc., as 
interrelated factors which are of primary importance to the success 
of business. When social statistics analyzes the data recorded by an 
agency, of whatever sort, from every point of view to determine 
the effectiveness of the institution in the light of its own reports, 
it is doing what might be called social accounting. The business 
accountant records facts, and then applies statistical methods, suited 
to his purpose, to appraise the business. Too often social agencies, 
public and private (and here the educational system is regarded as 
a social agency), record many facts, assemble them in tables, pub- 
lish or file the assembled data, and carry the work no further. But 
this is just the point at which the serious work of the social statis- 
tician becomes interesting and takes on significance. 

3 




4 - 


SOCIAL STATISTICS 


Effective statistical work in social institutions requires adequate 
reporting of essential facts, and then the continuous and systematic 
analysis of these facts. The United States Bureau of the Census is 
maintained as a fact-collecting agency, and its primary responsi- 
bility ends with the collection, tabulation, and publication of these 
facts, though actually the Bureau analyzes some of its own mate- 
rial and occasionally issues monographs of first-rate importance. 
The latter, however, is a secondary function. The Bureau is not a 
functional agency in the same sense that state departments and city 
bureaus are. A state department of public welfare, a city park 
board, a community chest, or a school board exists to carry on 
definite service to the state or community. Its function is primarily 
administrative, not fact collecting. But it must have facts upon 
which to base the policies that underlie efficient administration, 
and administration will be much less efficient if these facts are not 
the subject of continuous analysis by a competent statistician. If the 
annual reports of departments of public welfare and school boards 
consisted in part of careful analyses of the data reported, they 
would make exciting reading for the public and would enlighten 
the administrators on many points. The great masses of official and 
quasi-public data assembled every year will yield up their meaning 
only after careful study; they are too complex to be interpreted 
by rule-of-thumb or impressionistic methods, such as are now com- 
monly in vogue among administrators. 

But social statistics requires more than periodic reports and 
analyses, if it is to perform for social science and social administra- 
tion a function comparable to experimentation in the natural sci- 
ences. Hand in hand with reporting facts goes the judgment as to 
what is significant. Statistical data are currently or periodically col- 
lected to afford a measure of the magnitude of the problems dealt 
with and to guide the administrator in the direction of greater 
efficiency and social effectiveness. Other data which bear upon 
causation may be equally important, if control over conditions is 
sought. Consequently, it may be asserted that a social statistician 
must know the field of his operations as well as statistical methods 
— should even be master of his field of interest before he is con- 
cerned with statistics. A mathematician possessed of the most re- 
fined statistical technique could not make a significant analysis of 
crime data unless he had studied crime and learned what factors 
are probably significant in its causation, control, or prevention. The 
social statistician must know the subject which is to occupy his in- 



INTRODUCTION 


5 


terest and must employ his statistical technique to extract meaning 
from it and to measure trends, variations, and relationships. 

It is the purpose of this chapter to indicate specific uses of 
statistical methods in the field of social problems. The presentation 
will of necessity be brief, but it will cover in summary form the 
following points under each problem discussed: (i) data relating 
to the occurrence of the problem in time, place, and population 
group; (2) data concerning the magnitude of the problem; (3) 
data concerning administration and its efficiency; (4) data con- 
cerning possible causes; (5) data concerning social control of the 
conditions. 


2. GENERAL EDUCATION 

The systematic transmission of the culture of the adult genera- 
tion to children is the problem of the free public schools. The 
culture which the schools attempt to pass on includes knowledge, 
skills, and attitudes. It is the most gigantic social problem with 
which each generation has to deal, and one which comprehends 
every child born or brought into the United States; it is the social 
problem for which the most elaborate machinery has been devised. 
Tons of paper bearing educational statistics are filed every year. 
For many purposes these data would be inadequate and would 
have to be supplemented, but they are adequate for other purposes 
if they are analyzed and have the juice squeezed out of them. 
These data bear upon social problems other than education, because 
the assimilation and utilization of culture may have manifold 
effects. 

Education is the concern of every citizen, urban and rural, rich 
and poor, educated and ignorant. The problem of insuring every 
child a minimum of free public education is tremendous. All large 
cities in this country now conduct their schools for nine or ten 
months each year, but in rural communities this long period is 
often not achieved. The property per capita available for taxation 
is lower in rural communities. There is also a wealth differential 
among urban communities. Communities with relatively low per 
capita wealth must levy high taxes, or they must have state or 
federal aid. Almost all states have a public school fund in which 
local communities share according to the number of children of 
school age, but this kind of aid does not equalize the opportunity 
; of all children to receive education because many communities are 
\ handicapped by low per capita wealth. Only a system of special 



6 


SOCIAL STATISTICS 


state or federal aid can equalize educational opportunity. Some 
states provide such special aid. In the determination of the amount 
of aid required and the places requiring it statistical information 
is indispensable, and competent analysis of such information is no 
less necessary. It is the problem of state departments of education 
to determine where special aid is required to bring local schools up 
to standard achievement. The communities suffering from inade- 
quate schools change from one year to another. Children have 
varying ability to profit by school education ; consequently, within 
a city or a rural county there arises the problem of provision for 
typical children. Surveys have shown that whole counties may 
have a disproportionately large number of mentally handicapped 
children. Hence, the composition of the population is an important 
factor in school administration. Only continuous study of such 
economic and social problems by one trained in the methods of 
research can insure its efficiency. 

The magnitude of the national education problem is indicated 
by the fact that in 1930 there were approximately thirty-five mil- 
lion children between five and twenty years of age. To educate this 
number requires about a million teachers, besides many thousands 
of administrators, research people, and clerical workers. In the 
school year, 1927-28, there were in continental United States 
257,251 public school buildings, and the value of school property 
was $5, 423, 280, 092. 1 Such a stupendous undertaking can go for- 
ward with any degree of satisfaction to the public only on a basis 
of sound statistical data and careful study and planning. Of course 
the administration of this vast institution is divided among states, 
counties, and local communities, but even they must rely upon 
statistics for guidance. Where state financing and supervision are 
factors, the quantity of data required is large, and the difficulties 
of interpreting them are great. 

Administration of a school system, whether by state, county, or 
city, requires an understanding of statistical methods and the ability 
to draw conclusions and formulate policies from masses of data. 
Periodically a city school administration has a survey made to take 
stock of its routine and social efficiency. Such sporadic surveys are 
implied confessions that the school system has not collected cur- 
rently all the data necessary for statesmanlike administration, or 
that competent statistical service was lacking, or both. City school 
systems are making increasing use of statisticians, but some of the 

1 World Almanac, 1931, p. 402. 



INTRODUCTION 


7 


annual reports still bear witness to the lack of appreciation of the 
value of such service in administration. This, however, is entirely 
aside from the broader social questions. The relation of schools to 
delinquency, to utilization of leisure time, to health education, to 
morbidity, etc., is a fact in which the community is vitally inter- 
ested, but, when such problems are considered, they are usually 
analyzed in some survey report instead of being analyzed from 
routine research, which is of more value to the community. Even 
a city school system which has a good social service department 
makes no annual analysis of its data; the records are filed, and 
only individual cases ever come to light to affect administration — 
in spite of the fact that the first principle of good statistical pro- 
cedure is that a generalization can be made only from a considera- 
tion of all cases or of a representative sample. 

Public education is a social problem because group conflict and 
differences in economic status and individual differences in ability 
exist in every community. Within the school system administrative 
problems may appear to be purely professional matters, but the 
way in which they are worked out affects the education of the child 
and, consequently, the community. The good school administrator 
is interested in every social relation of the school and he can judge 
the trend of development in these relations only through statistics 
currently analyzed and interpreted. He can gain control over de- 
velopmental tendencies either by shrewd guessing or by scientific 
study of the social and technical facts. Shrewd guessing is still the 
more common practice, but in some school systems steps are being 
taken to direct public education on the basis of continuous statistical 
analysis of facts. 


3. EMPLOYMENT 

Employment which is economically useful for all able-bodied 
adult members of the population is an American ideal which goes 
back to the earliest colonies. It becomes a social problem because 
American democracy desires full opportunity for each individual 
to earn his own and his family’s way in the world in the occupation 
for which he is best fitted, and also because our economic system 
is such that many men and women have to endure involuntary 
unemployment at various times. This may be seasonal in a certain 
locality because of the nature of the occupations available, and this 
kind of unemployment tends to recur every year at the same time. 
A business depression causes what is known as cyclical unemploy- 



8 


SOCIAL STATISTICS 


ment, when workers are laid off for months at a time. Cyclical 
unemployment is usually national or international in extent. Rapid 
changes in machine production and in administrative organization 
create what is called technological unemployment. This form may 
occur any time in a factory, a department store, or on a farm, if 
labor-saving machinery or new administrative devices are intro- 
duced. To keep people employed at productive labor is the positive 
way of stating the problem of unemployment ; relief for the unem- 
ployed and the prevention of unemployment is the negative way, 
but the latter is the more common approach to the problem. Many 
social problems for the individual, the family, and the community 
arise in the wake of unemployment. 

Although seasonal and technological unemployment occurs every 
year and cyclical unemployment every six to ten years, statistics 
are not available to indicate the magnitude of the problem. The 
community has given little attention to the first two types, except 
to maintain agencies for charitable relief ; but when a serious busi- 
ness depression occurs, the distress caused by unemployment fo- 
cuses a great deal of attention on the problem. Yet in previous 
depressions the estimates of the number unemployed in the country 
have varied by millions, a fact which merely emphasizes the dearth 
of statistics bearing upon a problem that can be dealt with ade- 
quately only when reasonably dependable data are available. The 
United States Bureau of the Census undertook a census of the 
unemployed in 1930, but the returns were so much in dispute 
that it is not known whether this census actually indicated the 
magnitude of cyclical unemployment in April, 1930. A few cities 
have made estimates which may be fairly accurate. The Indianap- 
olis Commission for Stabilization of Employment estimated that 
the percentage of the employable population in that city who were 
employed declined from 97.2 per cent, March 31, 1930, to 78.1 
per cent, December 31, 1930. Some of those who were unem- 
ployed December 31 were undoubtedly out because of the normal 
seasonal drop in employment, but the data available are in- 
sufficient to compute a seasonal index of employment which should 
be deducted from the total unemployed, to arrive at the number 
out of work because of the depression. However, cyclical unem- 
ployment usually involves millions of families, and the magnitude 
of the problem is reflected in the rising amounts paid out for 
charitable relief. But to the social statistician the important point 
is that statistics are inadequate in quantity and dependability. 



INTRODUCTION 


9 


Employment administration as a public responsibility is only 
beginning to receive attention. Some of the large cities operate 
free employment exchanges ; these are of great value in diminish- 
ing the period of idleness of the individual worker who is unem- 
ployed on account of seasonal and technological changes, but of 
small utility in a general depression. A national system of free 
employment exchanges is probably coming into existence, but the 
effectiveness of its administration will depend as much upon cur- 
rent statistics of employment in the locality, the state, and the 
nation as upon organization and trained personnel. Efficient em- 
ployment administration requires detailed statistics^ currently col- 
lected, concerning seasonal variations and technical changes in all 
important businesses in the city. Cyclical unemployment can be 
dealt with effectively only when the organized community has 
sufficient current information to anticipate increasing general un- 
employment some time in advance, and can promptly set in motion 
public works, emergency work, and other relief measures of such 
comprehensiveness that the volume of unemployment will not 
demoralize the community and the families of the unemployed. 

While the causes of seasonal and technological unemployment 
are fairly well known, there is much debate concerning the causes 
of cyclical unemployment. Seasonal lay-offs occur because the buy- 
ing habits of the public concentrate purchases of certain commodi- 
ties at particular seasons, because raw materials are available only 
at certain times and may be perishable, because outdoor work in 
the winter is difficult and inefficient, because second-line industries 
sell their products to primary industries which have seasonal varia- 
tions, and because some producers are in the habit of speeding up 
for a part of the year and slowing down at other times. An under- 
standing of these conditions and methods of removing them re- 
quires more information than is now available, and more intensive 
and comprehensive analysis. A single new industry of large pro- 
portions, such as the automobile industry, so affects the whole 
economic system that it is necessary for every community to have 
current statistics bearing on the problems of unemployment in 
order to know the causes of a given condition. 

Social control of the conditions leading to unemployment or to 
the alleviation of its effects is not far advanced. Stabilization of 
employment through the efforts of corporation executives has re- 
duced the number of seasonal lay-offs in particular businesses and 
offers one way for further advance. The free employment ex- 



10 


SOCIAL STATISTICS 


change is about the only agency yet developed which can readjust 
workers in new jobs when they are thrown out of work by tech- 
nological conditions. No control over conditions leading to cyclical 
unemployment exists and here only relief measures are available. 
England and Germany have set up systems of unemployment in- 
surance, benefits from which are available to anyone who is un- 
employed and cannot find work provided he is in the categories of 
the workers insured. If such a measure should be adopted in the 
United States, it would at the very outset require vast information 
to work out th. plan on a sound actuarial, basis. Once put into 
operation, it would take a small army of statisticians to keep up 
with the collection and analysis of data. Unemployment, more than 
any other social problem, requires the use of statistics and statisti- 
cal methods, if it is to be handled in a statesmanlike manner. 

4. POVERTY 

“The condition of poverty obviously attends every person who 
habitually lacks the means to sustain himself on such a footing of 
physical fitness as will enable him to carry on effectively for him- 
self and his legal dependents. Such a person may not be in abject 
want and yet be in poverty. He may be a laborer whose weekly 
wage is barely sufficient to sustain life, leaving no margin for 
advancement. He is not in danger of immediate death from starva- 
tion, but he lacks enough to maintain a permanent and reasonable 
standard of physical fitness .” 2 Poverty is thus defined in terms of 
physical health. Of course, economic standards of living change, 
and probably the concepts of physical fitness vary also. But in 
defining poverty in terms of physical health a more objective ap- 
proach to the problem is insured. Poverty is the usual condition 
of what is sometimes called the “submerged tenth.” According to 
this opinion, if the economic status of any large number of people 
in a given geographic area were known, it would include those in 
poverty as about ten per cent of the total. Where the foreign-born 
and the Negroes constitute a large proportion of the population, 
the incidence of poverty is probably much greater. Its occurrence is 
more obvious in certain districts of large cities than in rural areas, 
but this may be only apparent because so many poor people live 
close together in cities, the laboring population tending to live as 
near their work as possible to save car fare and because house rents 

'Kelso, Robert W., Poverty, p. 3. New York: Longmans, Green & Company, 
1929. 



INTRODUCTION 


1 1 

are low in areas contiguous to industry. Poverty is more common 
among industrial laborers and farmers than in any other occupa- 
tional group. 

That the problem of poverty is one of great magnitude is indi- 
cated by the large sums of money spent every year in poor relief, 
and the large numbers of persons receiving such relief. In Feb- 
ruary, 1929, fifty private relief agencies distributed $514,007 to 
21,069 cases — in the majority of these a “case” is a family $ and 
the same agencies distributed $3,986,958 to 221,550! cases in 
February, 1932. 8 

Current indexes of general business conditions- in February, 
1929, were a little above normal for that month, and considerably 
below normal in February, 1932. Possibly the difference between 
the number of relief cases handled by these agencies is some indi- 
cation of the numbers of people who live in poverty but who do 
not require charitable aid except in times of business depression. 
Another indication of the number of the poverty-stricken requiring 
aid is given by reports of public poor relief in certain states. Be- 
tween April 1, 1928, and March 31, 1929, Massachusetts spent 
$12,851,771.51 for the aid of 149,523 persons. For each thousand 
of the population 37.21 persons received public aid, or 3.72 per 
cent. 4 In Indiana, for each thousand of the population (estimate), 
43.05 persons, or 4.31 per cent, received outdoor relief in the 
fiscal year ending September 30, 1929, and in poor asylums 2.07 
persons (includes holdovers, new admissions, and readmissions) 
per thousand population were given aid during the fiscal year end- 
ing August 31, 1929. The total cost of these two kinds ofi poor 
relief was approximately $2,862,500. 5 Besides these public chari- 
ties illustrated by Massachusetts and Indiana, there are many 
private agencies giving relief, and many other agencies give serv- 
ices to people unable to pay for them. The figures mentioned here 
suggest the magnitude of the problem of poverty, but they do 
not give any exact measurement. A problem so great and so ex- 
pensive should warrant more complete statistical records and a 
more systematic analysis of the data. 

Poor relief is administered as a dole system by public agencies j 

“Published reports of the Russell Sage Foundation. 

4 Annual Report of the Massachusetts Department of Public Welfare, 1929, 
PP- 132-134. 

Indiana Bulletin of Charities and Correction, No. 182, pp. 203, 204, 302, 303, 
and Nos. 183-184, p. 365. 



12 


SOCIAL STATISTICS 


the exceptions in which the principles of social case work are em- 
ployed are too few to make much difference. The private relief 
agencies are increasingly giving relief only as a part of the process 
of rehabilitation, and it is these agencies which have seen the im- 
portance of fuller records and of employing statisticians in their 
work. Public relief of poverty, as it is now administered, is gen- 
erally believed to contribute to pauperism. Whether it does or does 
not is a matter to be determined by more complete data and their 
analysis. In Indiana, the trend of public poor relief in proportion 
to population, for the past thirty years, has been upward. Does 
this reflect a more liberal policy on the part of overseers of the 
poor? Or does it reflect a growing class of poverty-stricken citizens? 
Statistical research would help to answer these questions. 

Information on the causes of poverty exists in the records of 
public and private relief agencies, but it has not been studied 
scientifically to any important degree, although case studies and 
some efforts at statistical summary and analysis have been made 
by individual social agencies. But the question narrows down to a 
judgment as to whether poverty is due primarily to personal in- 
adequacy in modern civilization or to defects in economic organiza- 
tion. Low wages in large families is undoubtedly a factor, because 
a business depression sends to relief agencies many persons who do 
not require such aid under ordinary circumstances. Low mentality 
seems to play an important part as a cause of poverty, and per- 
sonality disorders come in for consideration. Disasters, accidents, 
and illness precipitate people into poverty. No doubt, the relative 
importance of these factors varies in different localities. Because 
this is true, continuous local records and their systematic analysis 
are fundamental to a comprehensive understanding of the specific 
causes of poverty. 

Control over the conditions which lead to poverty waits upon 
more certainty concerning these conditions. Increased wages and 
stabilization of employment might improve the situation j the early 
detection of physical defects and peculiarities of personality might 
help in the control of two groups of cases ; segregation or steriliza- 
tion of the feeble-minded would prevent this class from rearing 
families in poverty. But all such efforts at control depend upon 
more careful scientific work than has yet been done in the field of 
social problems and social work. Relief, unemployment insurance, 
pensions, and made-work are palliatives to minimize the distress 



INTRODUCTION 


13 


of the victims of poverty; they are not means of control over 
causes. 


5. OLD AGE 

Dependent old age is increasingly a social problem. The pro- 
portion of aged persons varies in time and place and according to 
ethnic composition of the population. In 1850 the percentage of 
males sixty years of age or over was 4.0 and of females 4.2, but 
in 1920 the percentage of males in this age group was 7.4 and of 
females 7.5. The relative number of the aged in the United 
States has almost doubled in seventy years. The_ percentage of 
persons sixty-five years of age or over in 1920 varied from 3.4 
per cent in the West South Central States to 5.8 per cent in the 
New England States. By ethnic composition the percentage of 
persons sixty years of age or over varied as follows in 1920: native 
white parents, 8.1 per cent; foreign-born parents, 4.9 per cent; 
mixed parents, 6.1 per cent; foreign-born, 14.9 per cent; Negro, 
5.1 per cent. Thus it will be seen that the determination of the 
occurrence of aged persons is a statistical problem itself, and, when 
it is related to social factors, dependent old age becomes an ex- 
ceedingly intricate one. 

To a considerable extent dependent old age as a problem to the 
nation and to local communities arises from income inadequate to 
permit saving for old age, to financial inability of adult children 
to take aged parents into their homes, to the tendency of em- 
ployers to discriminate against older men, and to the fact that the 
percentage of the population above sixty years of age is steadily 
increasing. The magnitude of the problem is suggested by the 
percentages just given. It is further emphasized by the actual 
number of persons sixty-five years of age or over in the United 
States in 1930, which was 6,633,805. The problem is not mate- 
rially reduced when it is recognized that about half this number 
are women, since they must be taken care of either in their own 
homes or somewhere else, and it is a fact that their husbands are 
finding it increasingly difficult to find employment, if they are 
wage earners. 

That part of public administration which deals with the aged is 
concerned almost exclusively with relief. In sixteen states old age 
pension systems have been introduced. In some states the minimum 
age for eligibility is sixty-five and in others seventy, all states 
providing that persons eligible by age ate not rendered ineligible 



H 


SOCIAL STATISTICS 


by the possession of more than the maximum of property allowed. 
That old age pension systems are expensive is indicated by the fact 
that New York State spent about $12,000,000 for 1931, the first 
year of its operation. Other relief of the aged poor is left to the 
charitable agencies and to the poor asylums. In the private relief 
agencies rehabilitation of the aged is undertaken, and case-work 
treatment seems to give promise of good results. Public outdoor 
relief agencies simply dole out relief without any effort at con- 
structive work. The poor asylums are in fact custodial institutions 
in so far as the permanently incapacitated old person is concerned, 
and the private homes for the aged are of the same general char- 
acter, though they are generally better managed and more com- 
fortable. The poor asylums, of course, do not restrict admission to 
elderly persons, but in Indiana in recent years over two-thirds of 
the poor asylum population have been sixty years of age or over. 
Little effort is made any where in the country outside of the pri- 
vate case-working agencies to prevent dependence in old age. Pre- 
vention is left to the individual or to his family. Occupational 
adjustment might be possible on a much larger scale for the able- 
bodied person past sixty. Much more careful study of old age 
relief and of preventive possibilities needs to be undertaken both 
by departments of public welfare and by private agencies. 

The causes of old age dependence are believed to be many. 
They include illness, mental disorder, mental deficiency, personal 
improvidence, insufficient income in productive years, criminal be- 
havior in earlier life, and the disinclination of employers to take 
on elderly people. The relative importance of these factors in 
different localities is not known, and control of the conditions 
which lead to old age dependence requiring charitable relief or a 
pension cannot go far until the problem is better understood. The 
states which have adopted old age pension systems should become 
laboratories for the study of old age problems. It may be found 
that pensions encourage dependence and remove important incen- 
tives to self-maintenance or to cooperation in a plan of prevention. 
Careful records and thorough statistical analysis are indispensable 
prerequisites to the solution of this problem. 

# 

6. DEPENDENT AND NEGLECTED CHILDREN 

A dependent child is one whose parents are dead or are in- 
capable of taking care of him and whose near relatives cannot 
assume responsibility for him. A neglected child is one whose 



INTRODUCTION 


15 

parents or near relatives do not give him the care he needs but 
who may be financially able to do so. In such cases either a private 
children’s agency or the state undertakes the care of the child. No 
social problem receives more attention than that connected with 
children. This is true because the thought of children inadequately 
cared for arouses sympathy and immediate action, but also because 
as a practical matter the children who are neglected or lack the 
elemental necessities of childhood soon perish or grow into adult- 
hood with numerous handicaps. If a child cannot be reared satis- 
factorily in his own home, society reserves the right to make him 
a ward of the state and to supply as much as possible of what 
is lacking. 

In 1920 nearly one-third of the population of the United States 
was under fifteen years of age. This group furnishes the prob- 
lems with which child welfare efforts are concerned. Dependency 
and neglect are two of the most common problems. Probably they 
occur more often among the foreign-born and the Negro popula- 
tion than among other groups, though reliable statistics for the 
country as a whole are not available to show the exact condition. 
They seem to occur more often in the city than in rural com- 
munities, but this may be more apparent than real since facilities 
for detecting these conditions are more numerous and better or- 
ganized in cities. Some geographic divisions of the country report 
much larger rates of dependency and neglect than others, but in 
the absence of statistics to prove the point this fact may be assumed 
to reflect differences in standards of child care rather than differ- 
ences in the rate of occurrence of the problem. 

As compared with other social problems, the magnitude of the 
problem of dependent and neglected children can only be sug- 
gested. A report of the Bureau of the Census indicates something 
of the situation. 6 For every 100,000 of the total population of the 
United States in 1923, 198.7 dependent and neglected children 
were reported $ but in New England the rate was 353.0, and in 
the West South Central States it was only 98.7. On February 1, 
! 9 2 3 j there were 148,979 children in institutions for dependent 
and neglected children or under the supervision of these institu- 
tions. On the same date 339 child-placing agencies reported 52,979 
children under their care. Children are continually coming under 
the care of such agencies, though, of course, other children are 

* Children Under Institutional Care, 1923. Bulletin of the United States Bureau 
of the Census. 



16 SOCIAL STATISTICS 

continually being released. Between February i and April 30, 
1 92,3, 1,558 institutions reported that they had received 9,198 
children, and 339 child-placing agencies received 7,181 in the same 
period. It is estimated by the Bureau of the Census that another 
group of children numbering about 121,000 is under care of 
mothers’ pension administration. Besides these agencies, there are 
day nurseries and certain institutions which receive pre-delinquent 
and mildly delinquent children on the same basis as dependent and 
neglected children. 

These figures suggest the magnitude of the problem, but they 
do not indicate the degree of efficiency in administration attained 
by institutions and child-placing agencies. Such data as a whole are 
entirely lacking. A few studies of limited extent have been made, 7 
but they do not reflect the results obtained throughout the coun- 
try. That is a technical problem requiring more data than are now 
available for analysis. Much research has been, and is being, done 
to determine the best ways of handling dependent and neglected 
children, but in the main this is case study which throws light only 
upon methods of individual treatment. The larger problem of the 
interrelationships of dependency and neglect with other social fac- 
tors has received much less attention, chiefly because technically it 
is a statistical problem. 

The proximate causes of dependency and neglect in individual 
cases are usually known, but why the social order should produce 
such pathological conditions is still far from a scientific answer. 
This question becomes more difficult, when it is known that the 
number of such children under care in proportion to population 
seems to be increasing. The determination of whether this indi- 
cates growing pathological conditions or more active response to 
the needs of children is a first-rate problem for social research. No 
control in the sense of preventing the conditions which give rise 
to dependency and neglect is possible until much more research 
has been done. Statistical records and statisticians are essential to 
the solution of much of this problem. Years will be required for 
the accumulation of facts. Current statistics gathered by depart- 
ments of public welfare should be studied as they come in, but so 
many factors are involved that a full understanding will be 
achieved only after observation of many annual series over a 
period of years. Time is itself an important factor, and that is 

7 See Van Theis, Sophie, How Foster Children Turn Out. State Charities Aid 
Association of New York, New York, 1924. 



INTRODUCTION 


*7 


perhaps one reason why the best results in the study of the prob- 
lem will be obtained by research workers who are regular mem- 
bers of departments of public welfare and who serve with an 
indefinite tenure of office. 


7. DIVORCE 

Divorce is becoming easier, and consequently more important as 
a social problem. The more applications for divorce there are, the 
more time the courts have to give to this type of litigation. When 
a married couple seeks to dissolve their marriage relations, the 
public is concerned not merely with the fact that a decree of di- 
vorce may be issued to the husband or the wife. Other matters of 
public importance are involved: the disposition of children, the 
division of family property, and the socio-psychological effects on 
the parties concerned. More than a third of the divorce cases in- 
volve children. It is generally believed, though not proved con- 
clusively, that children are socially handicapped if their parents are 
divorced j and there is considerable evidence that behavior prob- 
lems develop in such children more readily than in children living 
with both parents. Slightly more than one-third of divorces are 
granted in the first five years of marriage, when there are no chil- 
dren or the children are small. The occurrence of divorce varies 
among states according to the liberality of the laws. No divorces 
are granted in South Carolina, but in Nevada they are granted 
freely. Texas, with a population only half as large as New York, 
has three times as many divorces. 8 Divorce occurs much less fre- 
quently among Catholics or Jews than among Protestants. Eco- 
nomic conditions so affect the divorce rate that in times of depres- 
sion there is a distinct drop, while in prosperous years the rate 
shows a marked rise. 

The magnitude of the social problem created by divorce can 
only be suggested. In 1928 the Bureau of the Census reported that 
1 95,939 divorces were granted, and the 1920 census showed that 
there were 508,588 divorced persons who had not married again. 
It is probable that the majority of persons who get divorces re- 
marry. So the census figures do not indicate the number of persons 
in the population who have at some time been divorced. The num- 
ber is much larger than that for divorced persons remaining un- 

8 See Marriage and Divorce , 1928. Bulletin of the United States Bureau of 
of the Census, 



1 8 


SOCIAL STATISTICS 


married. Ogburn estimates that 0.7 per cent of the total population 
was divorced in 1920. 9 

Judicial divorce statistics are inadequate. The courts are con- 
cerned with individual cases as they come through $ certain in- 
formation is obtained and filed. Statistical reports cover only a few 
items which are quite insufficient for either administrative or social 
purposes. The courts have not seen the value of statistical studies 
of their work, which would enable them to compare accurately the 
grist of one year with that of another. A research project in Ohio 
and Maryland is now being carried on by the Institute of Law of 
Johns Hopkins University for the purpose of determining what 
statistics are most valuable for judicial administration and for 
social reporting. When this question is answered, it still remains 
to get the plan of reporting adopted by courts and, most impor- 
tant, provision made by the judicial system for competent, current 
analysis of the data. 

The causes of divorce are not those alleged in the legal grounds 
for divorce. The complaint is made in a form which the com- 
plainant believes will meet the requirements of the law, but the 
specific circumstances which led the parties to decide to dissolve 
their marriage relations do not often appear. In a large proportion 
of the cases the causes lie in the peculiar personalities of the mar- 
riage partners. These factors are highly indefinite and difficult to 
express in statistical units. If the socio-psychological factors can 
ever be represented adequately by more objective factors, it may 
be possible to make a comprehensive statistical study of the causes 
of divorce. At present such a study cannot advance far. Social con- 
trol of divorce depends upon the law and the attitude of leniency 
or strictness shown by the judge. More adequate statistics of 
divorce, which would show clearly its effects, would be some guide 
to the future modification of the law. These have yet to be 
developed. 

8. CRIME AND DELINQUENCY 

Briefly, crime is a violation of the law. Delinquency is a term 
usually applied to young ofljpnders, and may be a violation of the 
law or may be anything that might lead to an overt violation. Next 
to the problem of unemployment, crime is probably the most ex- 

® Groves, E. R., and Ogburn, W. F., American Marriage and Family Rela- 
tionships, p. 360. New York: Henry Holt & Company, 1928. 



INTRODUCTION 


19 


pensive social problem confronting the nation. Efforts have been 
made to locate the occurrence of crime specifically in time, place, 
social strata, age and sex groups, and ethnic groups. Some success 
has attended research in certain cities in defining the geographic 
centers from which the bulk of crime springs \ in all cities so far 
studied the concentration is just outside the main business district 
and in certain outlying industrial districts. The seasonal variations 
are less well defined, though certain types of crime appear to have 
regular ups and downs. The age and sex distribution is rather well 
known. The ratio of males to females on January 1, 1930, was 
about twenty-four to one. 10 Among juvenile delinquents the ratio 
of males to females is about four to one. About One-half of the 
males and about two-thirds of the females are fifteen to seventeen 
years of age. 11 Statistics on the distribution of crime by social strata 
are insufficient to draw a conclusion, but it is probable that a dis- 
proportionately large number of criminals and delinquents come 
from the lower economic classes — those who would be unskilled 
or semi-skilled workers. The foreign-born and the native white of 
native parentage, when age and sex are held constant, probably 
show the lowest rates, while the Negro and the native white of 
foreign parents show higher rates. A few particular foreign-born 
groups appear to have high rates, however. Altogether, there is 
still a good deal of statistical work to be done in locating the oc- 
currence of crime and delinquency. 

The magnitude of the crime problem, as reflected by estimates, 
is staggering. Much work has been done to produce standard crim- 
inal statistics, but much remains to be done. The number of con- 
victions by courts furnishes the most reliable statistics, but convic- 
tions are so small a percentage of crimes that they do not accurately 
represent the volume of crime. On January 1, 1930, there were 
120,496 inmates of penal and reformatory institutions, and the 
commitments in 1930 were 78,8 66. 12 Furthermore, a great many 
prisoners were on parole from the institutions, and many more 
convicted criminals were on probation. Hence, institutional sta- 
tistics are inadequate as a measure of the volume of crime. Similar 
statistics are available for juvenile delinquents. On January 1, 

10 Prisoners, 1930. Report of the United States Bureau of the Census. 

u Children Under Institutional Care , 1923. Report of the United States Bureau 
of the Census. 

12 Op. cit. 



20 


SOCIAL STATISTICS 


1923, 25,233 juvenile offenders were in institutions, and about 
that many more were admitted during the year. 13 These figures, 
even more than those for adults, fall short of reflecting the real 
situation, because many more juveniles than adults are put on 
probation. Aside from the social losses to the country through this 
volume of crime and delinquency, the cost of maintenance of police 
systems, courts, and institutions is stupendous, and to these costs 
must be added the destruction and theft of property by criminals. 

The administration of the courts, the police systems, and the 
institutions is the kind of social problem that calls for ample care- 
fully made statistical records and a systematic analysis. During the 
last few years the federal government and many of the states have 
appointed crime commissions to survey the situation. Such sporadic 
surveys have been made before, but they accomplish little other 
than tightening of the law and for a short time directing the 
attention of the public to crime. Administration requires continu- 
ous factual records, just as a corporation requires continuous ac- 
counting, and efficient administration requires at best annual, 
systematic analysis of the work of the year — not a popular report 
for the press but a report that is technically as competent as that 
presented by the officers of a corporation to the board of directors. 
Probably no court, police system, or institution in the country has 
an equally competent annual analysis of its work. Changes in the 
law, tightening court procedure, and more vigorous police efforts 
are usually the extent of the effects of a crime survey. Administra- 
tive efficiency requires police, judicial, and institutional accounting 
of a high order. 

Study of the causes of crime has hardly arrived at the stage of 
statistics except by indirection. Criminals seem to come from an 
economic class whose income is for the most part in the lowest 
quarter; they live prevailingly in neighborhoods which are un- 
desirable to most people for residential purposes; there seems to 
be a disproportionately large number with low-grade mentality 
and with mild or serious mental disorders; thwarting of personal- 
ity in childhood seems to be a causal factor; racial and ethnic 
discrimination seems to appear as a cause in some crime and 
delinquency. Studies of prisoners by Glueck, Void, and Burgess 
have led to the development of scales, constructed from social 
background data, which indicate the expectancy of success or failure 
13 op. cit. 



INTRODUCTION 


21 


on parole . 14 If after sufficient use such expectancy tables prove to 
be a reliable guide, then it would seem that the chief causes of 
antisocial behavior have been found. Control over the conditions 
which develop criminals and over the reconstruction of behavior 
depends upon further statistical study of this kind. Administrative 
efficiency and control will very likely advance together. 

9. BIRTH AND DEATH RATES 

Birth and death rates are biological facts, but they are of im- 
portance to the scientific study of almost every social problem. 
From the rate of increase in population due to the difference be- 
tween the number of births and the number of deaths and between 
immigration and emigration school administrators can estimate 
the amount of equipment, the number of buildings, and the teach- 
ing staff that will probably be needed several years in the future. 
Specific birth rates vary in different social and ethnic groups, and 
they vary according to the age and sex composition of the popu- 
lation. Specific death rates in age and ethnic groups vary widely 
and are important in the study of public health work, employ- 
ment, and poverty. Death rates vary in certain geographical areas, 
even though the rates may be computed for a standard population. 
Births and deaths are universal phenomena, and wherever social 
problems exist they are factors that need to be taken into con- 
sideration. 

Births and deaths have been recorded for many years, but even 
now there are areas of the United States not included in the “reg- 
istration area.” Many counties do not have health officers, and in 
these counties reports of vital statistics are incomplete or wholly 
lacking. But the problem of getting data for computing birth and 
death rates is not as great as the problem of obtaining some other 
kinds of data. Reporting is fairly well standardized for births and 
deaths, and crude birth rates and general death rates are reason- 
ably reliable. This is not true of specific birth and death rates, 
however, because their computation requires detailed information 
regarding age, sex, and ethnic composition of the population 
which are available for the country only every tenth year, when 

14 Glueck, Sheldon and Eleanor T., 500 Criminal Careers, New York: Knopf, 
Chap. 18, 1930. 

Void, G. B., “Factors Entering into the Success or Failure of Minnesota Men 
on Parole,” American Sociological Society Papers , May, 1930, pp. 167-169. 

Bruce, Harno, Burgess, and Landesco, Parole and the Indeterminate Sentence. 
Department of Public Welfare of Illinois, 1929. 



22 


SOCIAL STATISTICS 


the census is taken — a few states take a census in the middle of 
each decade. Consequently, the composition of the population in 
intercensal years has to be estimated, and birth and death rates 
computed on the basis of these estimates are subject to consider- 
able error. 

The administrative efficiency in the collection of vital statistics 
depends largely upon the public health organization of the several 
states. If state boards of health are seriously interested in vital 
statistics, they can gradually build up a satisfactory system of 
reporting. Because the use of vital statistics is of long standing, 
boards of health are more likely than other public departments 
to employ their information in the study of causes and of control. 
They usually employ a statistician whose main business it is to 
collect and analyze vital statistics. 

IO. MORBIDITY 

Morbidity is a major social problem and is the cause of many 
other social problems, particularly those involving inadequacy of 
income. Illness occurs to every human being at some time in his 
life. Preventive medicine is aimed at reducing the frequency of 
disease and in some cases at its virtual elimination. Less is known 
about the occurrence of morbidity than of mortality. Even com- 
municable diseases are not reported to a central authority in all 
parts of the country, and acute diseases of a noncommunicable 
type are never reported unless they are treated in public hospitals. 
Private hospitals and physicians keep records for their own pur- 
poses, but these are not assembled in a central collecting agency so 
that they may be studied. 

In point of magnitude morbidity is one of the most important 
social problems. The economic loss due to loss of time from work 
and to actual medical costs is stupendous. Dr. Louis I. Dublin 
estimates that there are 150,000 physicians, 50,000 dentists, 150,- 
000 nurses, and 100,000 other employees concerned with the 
care of the sick. 15 The income of these groups and the costs of 
hospital service and medicine amount to about two billion dollars 
a year, or about 3.5 per cent of the national income. If the loss 
of time from work were added to the direct costs of illness, the 
total bill for illness would be much larger. 

Statistical study of noncommunicable diseases has been meager 

18 Dublin, Louis I., Health and Wealth, Chap. II. New York: Harpers, 1928. 
Other statistics in this paragraph are taken from the same reference. 



INTRODUCTION 


*3 


up to the present time. Professor George C. Whipple said in 1923: 
“It is much to be regretted that at the present time there is no 
adequate way of getting the facts in regard to sickness in the 
community due to diseases which are non-reportable. Sickness sur- 
veys are sometimes made, but they give only the facts at a given 
date, and are, moreover, very expensive to make. Hospital records 
help a little, the examinations made by the life insurance com- 
panies help a little, the recent examinations of men for the army 
have helped a good deal, but some day a more universal method 
must be devised.” 16 The situation has not changed much since 
Whipple made that statement. Some state boards of health make 
a commendable effort to collect morbidity statistics, t>ut the results 
are too inadequate to be of much use for either scientific or admin- 
istrative purposes. There are no laws compelling physicians to 
report all diseases ; frequently there is no official agency to which 
they could report. The public has not attached the same impor- 
tance to reporting morbidity that it has to mortality. 

The study of the etiology and treatment of disease is the func- 
tion of the science of medicine. But the occurrence of disease is a 
social problem and may properly be the object of study by the 
social statistician. Even the medical man has made little effort to 
employ statistical methods as an aid to an understanding of dis- 
ease; he has been concerned with cases and has not made much 
use of quantitative studies. The social statistician is concerned 
largely with environmental data. It is properly his interest to 
seek more adequate data bearing on his problem and to present 
the results of his study as a contribution to the knowledge of the 
causes and control of disease in so far as environmental conditions 
play a part. Public health officers are obviously concerned with 
environmental factors. Dr. Thurman B. Rice, of the Indiana Uni- 
versity School of Medicine, has found that there is a geographic 
concentration of goiter in Indiana. This area has been so affected 
by geologic changes that the iodine in the soil has been leached 
out. Water in that area lacks iodine content, and food products 
grown there are deficient in iodine. Goiter is much less prevalent 
in adjoining counties which have been affected differently by geo- 
logic changes. The concentration of a type of disease in any region 
or population group suggests the presence of environmental fac- 

Whipple, George C., Vital Statistics , pp. 122, 123. New York: John Wiley & 
hons, 1923. 



24 


SOCIAL STATISTICS 


tors. But research of this kind cannot be done extensively until 
morbidity is more completely reported. 

II. INSANITY 

Insanity is both a medical and a social problem. As the former, 
it is receiving a great deal of attention from the medical profession. 
The case for its social study is equally strong because, aside from 
the possibility of a social etiology, mental disorder is a complicat- 
ing factor in many other social problems. Mental disorders are 
roughly classified as functional and non-functional. The difficulty 
of drawing a sharp distinction between these two classifications has 
so far made impossible a judgment as to the relative importance 
of physical and social causes. At the present time functional dis- 
orders constitute much the largest proportion of all mental dis- 
orders. “If we accept the opinion that certain neuroses and psy- 
choses are functional,” says Professor Ogburn, “and that they 
indicate a lack of psychological adjustment of man to civilization, 
then the very great probability of developing in the course of a 
lifetime a functional psychosis or neurosis certainly indicates a 
very serious psychological maladjustment between man and his 
civilization.” 17 The problem raised by Ogburn is an important one 
in social statistics, because the symptoms manifested by an insane 
person are often so complex that a determination of the definite 
cause is next to impossible, whereas a statistical analysis of a great 
many factors in the experience of a large number of persons with 
mental disorders might result in the discovery of significant cor- 
relations. The rate of insanity for different age groups increases 
with age for both males and females. The occurrence of insanity 
in different social groups has for the most part yet to be deter- 
mined. 

The magnitude of the problem of insanity, especially its social 
aspects, can be estimated but is not definitely known. 18 In 1923 
there were in mental hospitals 240 patients per 100,000 popula- 
tion over fifteen years of age in the United States. Since many 
patients recover and are discharged, the number of patients in 
hospitals represents a disproportionately large number of chronic 
cases, and does not afford a basis for estimating the incidence of 
mental disorders in the population. Probably a study of new ad- 

17 Ogburn, Wm. F., “The Frequency and Probability of Insanity,” American 
Journal of Sociology, Vol. XXXIV, No. 5, p. 831. 

18 The estimates in this paragraph are taken from Ogburn, op. cit . 



INTRODUCTION 


*5 


missions would be more satisfactory as reflecting increase or de- 
crease. New admissions in hospitals in the United States in 1910 
were 66 per 100,000 population over fifteen years of age, and in 
1927 the rate was 109. This is a marked increase due either to an 
actual increase of insanity or to more adequate hospital facilities. 
Ogburn estimates that one in twenty-two boys over fifteen years 
of age in New York State will probably be a patient in a mental 
hospital during his lifetime. Using some data obtained in the 
army medical examinations as a basis for estimating the number of 
persons in the population who may be afflicted with a mental dis- 
order, many of whom will not be hospital patients, he concludes 
that in Massachusetts and New York the chances are that one in 
ten of the population above fifteen years of age will be so afflicted. 

Another way of suggesting the size of the social problem of 
insanity is represented by the financial cost involved. In 1923 there 
were 153 state hospitals caring for insane patients, with a capital 
investment of $246,348,925.52 — these figures omit twelve other 
state hospitals. Maintenance in 1927 for the same hospitals 
amounted to $77,731,015. Thus it will be seen that the costs of 
insanity are great and constitute a large item in public budgets. 19 

In every state there is some organization for the collection of 
statistics of insanity, but the facts reported are adequate only for 
forming an estimate of the volume and cost of insanity and the 
types of cases in institutions. For statistical work, which would be 
useful in administration, many more data are required. Usually a 
hospital draws its patients from certain counties which constitute 
its district. Frequently a superintendent is impressed by the con- 
centration of cases in a county, for which there is no obvious ex- 
planation. More complete statistics of cases correlated with popu- 
lation data might lead to an understanding of the concentration. 
But to be useful to state hospitals this kind of work should be 
done currently in each state. 

The study of causes of insanity has been confined largely to 
cases j little systematic effort has been made to apply statistics in 
a thoroughgoing way. Yet this is a method offering much prom- 
ise of fruitful work, and it is indispensable for control. The mal- 
adjustments between man and his civilization which Ogburn has 
suggested as causes of insanity must be studied statistically and 
especially by the correlation technique. In this way it may in time 

W Patients in Hospitals for Mental Disease, 1923 and 1927. Report of the 
United States Bureau of the Census. 



26 


SOCIAL STATISTICS 


be possible to estimate the relative importance of hereditary, 
physical environmental, and social environmental factors. The so- 
cial factors are perhaps more amenable to control than either 
physical or hereditary conditions. It is, therefore, all the more 
important that the study of social factors be undertaken. 

12. MENTAL DEFICIENCY 

Mental deficiency is a broader term than feeble-mindedness. It 
includes the feeble-minded, but also many others who are less 
retarded. Dr. Stanley P. Davies quotes the following definition 
of feeble-mindedness from the Report of the Mental Deficiency 
Committee of England, 1929: “The only really satisfactory cri- 
terion of mental deficiency is the social one, and if a person is 
suffering from a degree of incomplete mental development which 
renders him incapable of independent social adaptation and which 
necessitates external care, supervision and control, then such a 
person is a mental defective.” 20 Others are relatively deficient, 
even though they may not be classified as feeble-minded. It was 
once believed that all feeble-mindedness was hereditary; that is, 
that it occurred only in families where one or both parents or 
recent ancestors were feeble-minded, but now it is believed that 
about half the feeble-minded children are so limited mentally 
because of environmental causes. Mental deficiency is usually indi- 
cated by inability to make normal social adjustments because of 
intellectual limitations. Such persons cannot profit normally from 
ordinary school training and cannot make satisfactory occupational 
adjustments except, in some cases of the higher-grade mentally 
deficient, in unskilled work. Imbeciles and idiots, the two lowest 
grades of mental defectives, require constant care and often can- 
not attend to their simplest personal needs. 

The number of mental defectives in the population has been 
variously estimated. Dr. Davies has estimated that there are prob- 
ably about eight feeble-minded persons per 1,000 population in 
the United States, which would make about 1,000,000 at the pres- 
ent time. 21 Perhaps twice as many more are deficient in a less 
degree. The latter constitute a greater social problem than the 
feeble-minded. They appear with disproportionate frequency 
among the applicants for charitable relief, the misfits in school, 

80 Davies, Stanley P., Social Control of the Mentally Deficient, p. 6. New 
York: Crowell, 1930. 

81 Ibid. 



INTRODUCTION 


27 


the unemployed, the dependent children, the delinquent, and the 
criminal. They have greater difficulty than the individual of aver- 
age intelligence in making all social adjustments. 

Three general methods are available for dealing administra- 
tively with the mentally deficient: segregation in institutions, 
sterilization, and care in family homes. Segregation prevents re- 
production, if it is permanent, and provides care for the low-grade 
mental defectives, but it is very expensive. Sterilization definitely 
prevents reproduction, but it does not solve the problem of care. 
This method is perhaps better suited to the higher-grade defectives 
who are able to earn their living. Care in family homes is cheaper 
than institutionalization and under proper supervision it offers 
protection to society. Statistical records of administration are not 
sufficiently complete, and what records there are have been studied 
much less than they might have been. That is, the bookkeeping 
for mental defectives does no credit to the administrators. 

The causes of mental deficiency are known to some extent. A 
good many mental defectives inherited their deficiencies and 
probably carry defective germplasm themselves. Birth injuries, 
thyroid deficiency in the mother, certain kinds of illness in infancy, 
and congenital syphilis operate as environmental causes of mental 
deficiency. Little has been done in the way of quantitative studies 
of causes to determine their relative importance, their occurrence, 
or the possibilities of treatment. The work has largely consisted 
of a small number of case studies. The quantitative studies have 
been based upon too small a number of cases to give them gen- 
eral validity. A combination of case and statistical study offers an 
opportunity for fruitful research which should have important 
practical bearings. 

13. THE INTERRELATIONSHIPS AMONG SOCIAL PROBLEMS 

From the foregoing outline of social problems it is obvious that 
social situations requiring public or private attention are intricately 
bound together. The effort to give every child a minimum of 
education brings with it the complicating conditions of personality 
maladjustment, mental deficiency, dependency, delinquency, and 
crime. Old age is not simply accumulation of years of life$ it be- 
comes a problem that is complicated by unemployability, illness, 
dependency, and insanity. Delinquency is complicated by poverty, 
inefficient parents, mental deficiency, and personality maladjust- 
ment. These illustrations emphasize the fact that social problems 



28 


SOCIAL STATISTICS 


have both social and physical causes and that treatment involves 
consideration of both factors. The social worker never has a case 
that can be treated as one simple problem. The social history of 
the case and the social milieu of the individual have to be taken 
into consideration. Social statistics is one of the methods for deter- 
mining causes, points of concentration, and effectiveness of ad- 
ministration. 

In cities it has been noted that several kinds of social problems 
concentrate in the same areas. Delinquency, crime, and dependency 
have been found high in the same census tracts in Indianapolis, 
and in Cleveland several kinds of diseases have been found to 
concentrate in the same census tracts. In Chicago poverty, crime, 
and delinquency are associated. The ecological study of social prob- 
lems as suggested by the facts in these cities offers an interesting 
field for statistical research. 

What this brief survey of the field of social problems has in- 
tended to point out to the student is the growing reliance upon 
statistical methods and the obvious need for more adequate sta- 
tistical records and more systematic and continuous study of social 
problems as a public necessity. Social statistics is of primary impor- 
tance to scientific work and to efficient administration. 



CHAPTER II 


Sources of Published Statistics 


For thousands of years some kinds of social statistics have been 
kept by rulers and public officials. The clay tablets of ancient Baby- 
lonia reveal the fact that Hammurabi had a considerable amount 
of information about his people, particularly about the number 
and whereabouts of laborers and imperial slaves. The round popu- 
lation figures of the Old Testament indicate that the rulers had 
some conception of the numbers of their people and of their mili- 
tary man power. The Romans made estimates of the population 
of Rome and other cities, even if they did not take a careful census. 
The vast administrative system of the later Roman Empire necessi- 
tated some statistics. In the Middle Ages rather complete records 
were made of the population and status of the inhabitants of manors 
and feudatories. But it was not until the eighteenth century that 
social statistics in the modern sense began to be kept. Sweden has 
the longest record of population data of any country in the West. 
When the first census was taken in the United States, in 1790, 
one of the newest practices of modern governments was introduced. 
This census was a bare enumeration of the population for the 
purpose of determining representation in Congress. It had no sci- 
entific purpose, and its administrative purpose was limited to the 
relation between population and the number of representatives. 
In the decades since that date, the number of facts sought by the 
census-takers has gradually increased, so that it is now the most 
important and the most complete collection of sociological data 
in the country — this in spite of the fact that numerous agencies 
have arisen to collect social statistics. The census is our oldest ef- 
fort at statistical sociology, and it is used by many students for a 
great variety of purposes. But there are other agencies which 
collect social data of great importance, and it is the purpose of 
this chapter to indicate the nature of the work done by some of 
these agencies. No attempt is made at a complete list of such 

2Q 



30 


SOCIAL STATISTICS 


sources of social statistics. Aside from the description of statistics 
collected by the federal government, the sources mentioned merely 
illustrate types of agencies collecting statistics for administrative 
and scientific purposes. 

I. THE VALUE OF A KNOWLEDGE OF SOURCES 

A knowledge of the sources of statistical data is useful in sev- 
eral ways. It prevents needless duplication of work and waste of 
time. A social statistician who failed to acquaint himself with these 
sources would be like a historian who studied the history of the 
American Revolution without examining the collections of docu- 
ments in the Library of Congress, the New York Public 
Library, and the Boston Public Library. If certain desired statis- 
tics are already in existence, the work of the investigator is less- 
ened just that much, for he can get the published records and 
proceed with his study from that point. Another value of a knowl- 
edge of, and familiarity with, sources lies in the fact that it devel- 
ops the habit of thinking of problems in terms of facts. Collections 
of data, like the census of population, may be presented in tables 
which can be further analyzed in the study of particular problems ; 
or they may be presented with a complete analysis, as, for example, 
the monographs of the National Bureau of Economic Research. 
In either case, while familiarizing himself with them, the student 
is learning to ask for conclusions based upon facts rather than upon 
speculative reasoning. A third reason for knowing the most im- 
portant sources of social statistics is that it develops the expecta- 
tion that as time passes generalizations about social matters will be 
checked, rechecked, and refined by the constant appeal to facts. 
The natural sciences have made progress by innumerable accre- 
tions, some small and some large, for what has been discovered by 
one worker is published to the world of his fellow workers, and 
the body of the science grows. The social sciences will in the same 
manner progress from mere philosophy to something approaching 
science. Available social statistics constitute a part of the working 
tools of the social scientist, and are a part of the basis of action for 
the social administrator. 

In social research an acquaintance with the sources of existing 
statistical data is indispensable. Two illustrations will make this 
clear: one concerns the costs of social institutions or organizations, 
and the other, the changing number of persons aided or affected 
in some way by these institutions and organizations. For several 



INTRODUCTION 


31 


decades the amount of money spent each year for the mainte- 
nance of public charitable and correctional institutions has appar- 
ently been increasing at a rapid rate. In Indiana expenditures for 
this purpose increased from $1,9915005.27 in 1910 to $5,145,- 
640.55 in 1929 — an increase in nineteen years of 158 per cent. 
But the purchasing power of money was changing during that 
time, and these dollars are not comparable, because in 1910 a 
dollar would buy more of the elements of maintenance than it 
would in 1929. When the dollars for the two different years are 
made comparable by the use of an index of general prices expressed 
in equivalent dollars, the amounts would be $2,073,964 and $2,- 
874,659 respectively, or an increase of only 38.9 per cenf in the 
money cost of maintenance. Putting it another way, after the price 
adjustment is made, the per capita expenditures for maintenance 
of state institutions in Indiana was 77 cents in 1910 and 89 cents 
in 1929. Obviously, one who is dealing with comparative costs of 
maintaining social institutions over a period of years should know 
something about indexes of the general price level. 

The number of inmates in these institutions was 10,587 on the 
last day of the fiscal year in 1910, but it was 17,477 in 1929, an 
increase of 65 per cent. But as it stands, does this represent a meas- 
ure either of the increase of the magnitude of social problems in 
Indiana or of an increased public interest in the persons for whom 
the institutions exist? Clearly it does not, because the population 
of the state has increased during these nineteen years. An accurate 
estimate of increase either in the problem individuals or in public 
interest must be based upon the number of persons in the state 
institutions per 100,000 population. In 1910 this was about 392 ; 
in 1929 it was about 539. Although this shows an increase in the 
relative numbers in the institutions, it is much less than 65 per 
cent. The social statistician can hardly begin the study of any 
problem that will not require reference to the census of popula- 
tion for standardization purposes. If he deals with money, he must 
have recourse to a general price index. Some problems will require 
still other information which may be available in published statis- 
tics. The student should have some knowledge of these sources 
and should know where to find the supplementary data he requires. 

Teachers of the social sciences in high school and college find 
published statistics useful in their work. If a teacher knows his 
social statistics, his discussions of social problems or of general 
sociology will not have to be limited to qualitative analysis, but 



32 


SOCIAL STATISTICS 


can be supported by statistics. At best, the teaching of these sub- 
jects will be heavily weighted with opinion and speculation, but 
the more facts he knows and the greater his insight into their sig- 
nificance, the smaller the margin of opinion becomes. He is more 
independent of “authorities,” and he acquires sufficient knowledge 
to be entitled to his own judgment in his field. By constant recourse 
to statistics applying to his subject, he develops the habit of ap- 
pealing to facts. In other words, he teaches a body of knowledge 
and method, and not simply opinions. 

Social administration is increasingly becoming the work of tech- 
nically trained persons. The high executive may be less of a 
specialist than some others in his organization, but he depends for 
effective administration upon the work of experts. Writing of 
public administration, Leonard D. White says: “As these trends 
move on from decade to decade, they emphasize the decline of 
the amateur and the dominance of the expert. The amateur admin- 
istrator long ago lost his hold on the national services and is dis- 
appearing in the larger local services as well .” 1 This applies to 
private social organizations as well as to public institutions, bureaus, 
and departments. The administrator must know the facts which 
are important in his own organization as well as any others which 
have a bearing upon his efficiency. An understanding of statistical 
sources and studies in his field enables him to support, if not to 
replace, “hunches” by facts and by conclusions based upon a suffi- 
ciently large number of pertinent facts to make them trustworthy. 
It puts him in touch with the latest and best knowledge about 
problems similar to his own, and he benefits from the experience 
of others. He discovers new methods of analyzing his own prob- 
lems and learns of sources of statistical data which aid in their 
study. 

Two other values of a knowledge of sources should also be 
mentioned. Such knowledge provides a background upon which 
the investigator can block out the general situation in which he 
is interested. From these he obtains a picture of his problem and 
develops a perspective, both of which are important in planning 
studies and in interpreting the results of investigation. The other 
value lies in the fact that the investigator is enabled to relate his 
facts, which may involve special local and temporary variations, 
to the general trends shown by similar facts assembled from many 

1 White, Leonard D., “Public Administration,” in the Encyclopaedia of the 
Social ScienceSf Vol. I, p. 448. New York; The Macmillan Co., 1930, 



INTRODUCTION 


33 


sources. This kind of comparative study prevents hasty conclusions 
and the hasty adoption of policies. 

2. FEDERAL GOVERNMENT STATISTICS 

The United States government is the largest collector of sta- 
tistics in the country. The public has little conception of the amount 
of statistical work done by the government, a situation partly 
explained by the fact that statistics do not make easy reading, and 
that it takes more than an ordinary newspaper reporter to write 
them up in a manner to make them front-page news. The statis- 
tical information which does find its way into news reports usually 
has some popular aspect that can be seized upon and played up. 
But the statistics which the student and the administrator find 
interesting are bound in bulky volumes or issued in paper-back 
bulletins. Dr. Lawrence F. Schmeckebier has performed a useful 
service in bringing together in one volume a description of the 
statistical work of our government. 2 Although much of the gov- 
ernment’s statistical work is concerned with matters not convention- 
ally included in the category of social statistics, a brief outline of 
the types of this work will not be out of place here. The headings 
of Dr. Schmeckebier’s chapters will suggest the scope of this work: 
( i) population in general, method of collecting data and the classi- 
fication of the population; (2) special statistics of Negroes, Indians, 
Chinese, and Japanese; (3) dependents, defectives, and delin- 
quents; (4) immigrants and emigrants; (5) occupations; (7) 
births; (8) deaths, diseases, and accidents; (9) marriage and di- 
vorce; (10) religious bodies; (11) education; (12) labor and 
wages; (13) women and children; (14) general agricultural con- 
ditions; (15) production of crops; (16) livestock; (17) livestock 
products; (18) production of minerals; (19) products of fisheries; 
(20) production of manufactured articles; (21) surveys of indus- 
tries; (22) imports and exports; (23) land transportation and 
communication; (24) shipping; (25) domestic commerce; (26) 
water power and electric power; (27) prices; (28) finances of the 
national government; (29) public finances other than national; 
(30) general statistics of cities; (31) money and banking; (32) 
income and national wealth; (33) statistics of noncontiguous ter- 

2 Schmeckebier, Lawrence F., The Statistical Work of the National Govern- 
ment* Baltimore: Johns Hopkins Press, 1925. This is a publication of the Insti- 
tute for Government Research. 

See also Fry, C. L., “Making Use of Census Data.” Journal American Statis- 
Association, pp. 129-138, June, 1930. 



34 


SOCIAL STATISTICS 


ritory, that is, Alaska, Hawaii, Porto Rico, etc.; (34) statistics of 
foreign countries; and finally (35) miscellaneous kinds of statis- 
tics. Published statistics on these subjects can be obtained either 
from the department or bureau issuing them or from the Super- 
intendent of Documents, Washington, D. C.; in many cases they 
are distributed free, in others there is a small charge. “At the 
present time the statistical work of the United States government 
compares favorably, both in extent and quality, with that of any 
government in the world,” says Dr. Schmeckebier. “In the field 
of manufactures, especially, there is nothing in the work of foreign 
governments that can be compared with our biennial statistics. 3 
The truth of this judgment is further borne out by the fact that 
the Superintendent of Documents publishes the Monthly Cata- 
logue of Public Documents so that anyone may examine the cur- 
rent publications in his special field. This catalogue is published 
like a journal, and the subscription price is fifty cents a year. It 
lists many documents which are not statistical, but the proportion 
of listed documents containing statistics is large. 

The Bureau of the Census collects and publishes more statistics 
than probably any other division of the national government. Some 
of the information it collects will doubtless surprise the student 
who is familiar with the census mainly as the population of the 
nation, states, counties, and cities. The Bureau summarized the 
scope of its work recently in the following paragraphs: 4 

“The Bureau of the Census takes the decennial census of the 
United States covering population, agriculture, irrigation, drainage, 
manufactures, mines and quarries, distribution, and unemploy- 
ment, and is continuously engaged in the compilation of other 
statistics covering a wide range of subjects. 

“Statistics regarding the dependent, defective, and delinquent 
classes in institutions; public debt, national wealth and taxation; 
religious bodies or churches; and transportation by water are com- 
piled every tenth year in the period intervening between the de- 
cennial censuses; and statistics of electric light and power plants, 
electric railways, telephones, and telegraphs every fifth year. 

“A special census of agriculture is taken every fifth year fol- 
lowing the decennial census; and* census of manufactures is taken 
biennially. 

8 Op. cit., p. 1. 

4 List of Publications of the Department of Commerce t edition of May 15, 
1930, p. 13. 



INTRODUCTION 


35 


“Statistics of births, deaths, marriages, and divorces are com- 
piled annually; also financial statistics of cities and States; and 
statistics of prisoners in State prisons and reformatories, and of 
patients in hospitals for mental diseases and in institutions for epi- 
leptics and feeble-minded. 

“At monthly intervals statistics are published relating to cotton 
supply, consumption, and distribution ; to cottonseed and its prod- 
ucts; and at approximately semi-monthly intervals during the 
ginning season reports are issued showing the amounts of cotton 
ginned to specified dates. 

“The Bureau also collects monthly or quarterly data regarding 
the production or supply of many other commodities, including 
hides, skins, leather and leather goods, clothing, and wool. Current 
reports for these industries and commodities are multigraphed and 
issued as soon as the returns are tabulated. These reports are dis- 
tributed free of charge, and a complete list of those available may 
be obtained from the Director of the Census. 

“The Bureau publishes the monthly Survey of Current Business, 
compiling from various sources data regarding the movement of 
prices, stocks on hand, production, etc., for various lines of trade 
and industry, together with such other available data as may throw 
light upon the business situation.” 

The best known, and one of the most important, divisions of 
the work of the Bureau is the census of population. The headings 
of the schedule used for taking the census in 1920 were substan- 
tially as follows: 5 

Place of abode: 

1. Street, avenue, road, etc. 

2. House number of farm. 

3. Number of dwelling house in order of visitation. 

4. Number of family in order of visitation. 

5. Name of each person whose place of abode was in this family. 

Relation: 

6. Relationships of persons enumerated to head of the family. 
Tenure: 

7- Home owned or rented. 

8. If owned, free or mortgaged. 

Personal description: 

9. Sex. 

6 Schmeckebier, op. cit p. 18. 



36 


SOCIAL STATISTICS 


10. Color or race. 

11. Age at last birthday. 

12. Single, married, widowed, or divorced. 

Citizenship: 

13. Year of immigration to the United States. 

14. Naturalized or alien. 

15. If naturalized, year of naturalization. 

Education: 

16. Attended school any time since September 1, 1919. 

17. Whether able to read. 

18. Whether able to write. 

Nativity and mother tongue: 

Person enumerated: 

19. Place of birth. 

20. Mother tongue. 

Father of person enumerated: 

21. Place of birth. 

22. Mother tongue. 

Mother of person enumerated: 

23. Place of birth. 

24. Mother tongue. 

Ability to speak English: 

25. Is person enumerated able to speak English? 

Occupation : 

26. Trade, profession, or particular kind of work done. 

27. Industry, business, or establishment in which at work. 

28. Employer, salary or wage worker, or working on own 
account. 

In addition to this information, the census of 1930 included sev- 
eral questions about unemployment. The unemployment schedule 
was filled out only by persons who usually worked, were then out 
of work, were able to work, and were looking for work. 

The census of population is a complete enumeration of everyone 
in the country, and because of this fact it is invaluable as an aid 
to testing the representativeness of data collected for special stud- 
ies involving population. For example, in a study of crime the 
offenders may be classified by age. Is there a concentration at cer- 
tain ages? This question can be answered by comparing the age 
distribution of the population, as reported by the census, for the 
same area from which the crime data are drawn, with the age 



INTRODUCTION 


37 


distribution of the offenders. In many other ways the census of 
population may be used as a tabulation of the standard distribution 
of population characteristics, with which data collected for special 
purposes may be compared for testing the representativeness of 
the sample and for determining deviations from the normal dis- 
tribution of the characteristics of the total population. 

The unemployment census of 1930 undertook to enumerate all 
persons who were out of work because of the depression. The 
questions were so framed that they would exclude those who were 
idle because of illness, who quit their work voluntarily, who had 
been discharged for cause, who did not want to work, or who were 
out because of a seasonal decline in their occupations. Such a census 
had never been attempted before by the Bureau, and great diffi- 
culties were encountered in preparing the unemployment schedule. 
There was much criticism of the reliability of the results both 
before the census was taken and later when the results began to 
be published. The crucial problem was to define an unemployed 
person in such a way that the enumerator could recognize one and 
record the information asked for with a high degree of accuracy. 
This problem arises for the Bureau every time a decision is made 
to include an additional item in the census schedule. 

Another important report of the Bureau deals with occupations, 
on which data have been obtained at each census since 1830. Prior 
to 1910 occupations were returned in terms of the industry with 
which the individual was connected. This was unsatisfactory be- 
cause the types of occupations had changed greatly and because it 
did not permit the detailed analysis of occupations which later 
statistical inquiries necessitated. Consequently, the method of tak- 
ing this census was entirely revised in 1910, and occupations were 
defined in terms of the worker and the particular job he did, re- 
gardless of the major industry of which he was a part. In the 
study of any problem touching child labor, school attendance, 
changing types of occupations, number engaged in gainful occupa- 
tions, and geographical distribution of occupations, the census data 
are of inestimable value. 

The Bureau of the Census has been collecting various kinds of 
institutional statistics since 1830, the first being those on the blind 
and deaf obtained in that year. In 1840 data were collected on 
the insane and feeble-minded, and in the census of 1850 data on 
paupers and delinquents were included for the first time. Four 
special reports were published in 1904, 1910, 1915, and 1923, 



SOCIAL STATISTICS 


38 

giving data concerning benevolent institutions ; this classification 
includes children’s homes, day nurseries, hospitals, dispensaries, 
permanent homes, temporary homes, and schools and homes for 
che blind and deaf. Since 1890 all Bureau reports concerning de- 
pendents, defectives, and delinquents have been issued as special 
reports and not as parts of the decennial census. An annual census 
of prisoners, the insane, the feeble-minded, and the epileptic in 
institutions has been taken since 1926. These reports are of great 
value in showing growth but for the most part they reflect only 
the magnitude )f the problems as indicated by institutional popu- 
lations, capital investments, and cost of maintenance. They do not 
purport to measure the extent of dependency, defectiveness, and 
delinquency in the whole population; but, used with a full con- 
sciousness of their limitations, these census reports are valuable. 6 

Marriage and divorce have been the subject of census publica- 
tions since 1899. In that year a compilation of marriages and 
divorces from 1867 to 1 886 was made and published by the Bureau 
of Labor. Later another report was issued by the Bureau of the 
Census which included the data of the older report and brought 
them down to 1906. It was expected that a further report would 
cover the period from 1907 to 1916, but the war intervened, and 
this report was limited to marriages and divorces for the year 
1916. Beginning with 1922, however, the Bureau of the Census 
has published annual statistics of marriage and divorce. The data 
are given by geographical divisions as follows: the nation, groups 
of states, states, and counties. The number and rates of marriages 
for the population 1 5 years of age or older are given, and divorces 
are presented in tables showing the number by age and sex, the 
cause of divorce, children involved, and rates per 1,000 married 
people. The collection of these statistics is not complete for the 
entire country. “The statistics of marriages,” says a bulletin of the 
Bureau, “are now obtained from some office of the State govern- 
ment in 29 States, and the statistics of divorces are likewise ob- 
tained from State officials in 16 States. In the other States county 
officials furnish the information.” 7 The reliability of the reports of 
the county officials is questionable, and, in addition, all of them 
do not report regularly. But* the reports for the 29 states on 
marriages and the 16 states on divorces are probably fairly repre- 
sentative of the country as a whole. In states where new laws 

a Schmeckebier, op. cit., Chap. V. 

7 Annual Report of the Director of the Census, June 30, 1930, p. 20. 



INTRODUCTION 


39 


affecting marriage and divorce have been enacted such statistics 
enable students and administrators to determine to some extent 
the effects of this legislation. 

Vital statistics are valuable not only in themselves but for their 
use in the study of a variety of social problems. Since 1915 the 
Bureau of the Census has annually published the statistics of births 
in the registration area. This area included the District of Co- 
lumbia and the states which provided by law for the registration 
of births. In 1930, the registration area covered 46 states, South 
Dakota and Texas being the only ones not included ; in South 
Dakota, however, one city reports, and in Texas eight cities report. 
Statistics of deaths are now available for the same states. The 
reporting of deaths according to a registration area began in 1880 
with Massachusetts and New Jersey j but by 1890, six states were 
reporting systematically, and this number has steadily increased 
since that time. Since the organization of the permanent Bureau 
of the Census in 1900, annual statistics of deaths have been pub- 
lished as “Mortality Statistics.” Mortality and birth rates are 
estimated each year, but only in the census years can an accurate 
calculation be made. 

A census of religious bodies has been made since 1850. Ques- 
tions to obtain this information were asked at the time of the 
regular census in 1850, i860, 1870, and 1890. No general statistics 
were collected in 1880, though a report on cities did give certain 
information for the cities only. The law was changed after 1890 
to provide that a census of religious bodies should be made every 
ten years, but not in the year of the decennial census, and in ac- 
cordance with this provision special reports were prepared in 1906, 
1916, and 1926, giving the number of organizations of each de- 
nomination, the number of communicants by denominations, and 
the geographical distribution. These reports are of great usefulness 
for studying the church as a social institution. 8 

In addition to the volumes of data published by the Bureau of 
the Census, a number of important monographic studies of special 
problems have been made in recent years. These monographs deal 
with certain of the more important subjects covered by data col- 
lected by the Bureau, and may be obtained from the Bureau or 
from the Superintendent of Documents. 

The Children’s Bureau of the Department of Labor is engaged 
m the work of promoting the welfare of children. Dr. Schmecke- 

* Schmeckebier, op. cit. t Chap. XI. 



40 


SOCIAL STATISTICS 


bier says of this Bureau: “The Children’s Bureau of the Depart- 
ment of Labor is concerned with the study of questions relating to 
child life, and a portion of its work has a statistical basis, the 
remainder being descriptive and expository and dealing with such 
subjects as child-labor laws, illegitimacy, and health of mothers 
and children. The statistical publications of this Bureau are not 
issued at regular intervals, and do not form a series covering the 
same field for a number of years. Each one relates to a specific 
topic in a limited area, and is a complete study for the particular 
period, topic, and area covered.” 9 What Dr. Schmeckebier says is 
correct, but it does not lessen the value of the publications of the 
Children’s Bureau for their own purposes or for study by those 
who wish to gain an understanding of how studies in child welfare 
are made. 

Since the publication of Schmeckebier’s work this Bureau has 
begun the publication of a regular series of statistics, known as the 
monthly reports of the Registration of Social Statistics. It was 
begun July i, 1930, and at present is issued in monthly reports. 
The Bureau provides a schedule which is mailed to a number 
of large cities which have agreed to cooperate with it in collect- 
ing the data. The schedule calls for information regarding family 
welfare and relief, mothers’ and widows’ pensions, non-institu- 
tional service to ex-soldiers and their families, free legal aid, 
travelers’ aid, dependent or neglected children in foster homes 
or in institutions, applications for the care of dependent or neg- 
lected children, case work for such children, children in detention 
homes, protective case work for young people, care of children in 
day institutions, adult probation, temporary shelter for homeless 
or transient persons, maternity homes, hospital in-patient service, 
clinic and dispensary out-patient service, medical and psychiatric 
social service, public health nursing, and school health service. 
Each classification represents a table which shows a detailed analy- 
sis of data pertaining to it. Reports from a city are not used 
unless they include information from substantially all the agencies 
rendering the particular service. In this respect the Registration 
of Social Statistics differs from all other plans of social reporting. 
The public agencies print repbrts of their own statistics, but they 
omit reports of similar work by privately supported agencies. In 
some cities a community chest obtains statistical reports from all 
its member agencies, but these reports leave out the publicly sup- 
• Op. cit., p. 176. 



INTRODUCTION 


4-1 


ported agencies. The reports of the Children’s Bureau attempt to 
cover completely the cities for which such data are published. They 
began much as the Bureau of the Census did in the case of vital 
statistics, namely, with a registration area. 

The amount of research that can be done on the basis of the 
Children’s Bureau reports is as yet quite limited, but suggestive 
analyses may be made. This work was taken over from the Joint 
Committee of the Association of Community Chests and Councils 
and the Local Community Research Committee of the University 
of Chicago, and some analysis of the data has been published in 
the form of reports for 1928 and 1929. 10 The Joint Committee 
got the work of reporting under way, and then it was taken over 
by the Children’s Bureau. These two annual reports and the cur- 
rent monthly reports of the Children’s Bureau are exceedingly 
useful in teaching social statistics. They offer opportunities for the 
analysis of the social statistics themselves, and these data can be 
the basis for formulating certain problems which require the use 
of the census of population and of occupations. Vital statistics may 
also be introduced and correlated with the social data. Thus, there 
might be set up a research project which would extend over a 
considerable period of time and would involve the use of a good 
many of the common statistical methods. 

Another governmental bureau whose statistical reports may not 
be classified wholly as social statistics but much of which are social 
statistics is the Bureau of Labor Statistics. “In carrying out the 
purpose for which the Bureau of Labor Statistics was created,” says 
a bulletin of the Bureau, “data are collected in various ways from 
various sources — by personal visits of agents in the field and from 
correspondence, by consulting reports, trade journals, and other 
publications, by contract with experts to make special studies, and 
in other ways. All of the material in the publications of the bureau, 
whether prepared in the bureau or contributed by persons specially 
contracted with, is carefully edited in the office, and all facts and 
figures verified, whenever practicable, by comparison with the 
original sources.” 11 Some of the statistics published by the Bureau 
are primarily economic, but for the most part they have much 
wider social implications. 


°See McMillen, A. W., Measurement in Social Work, for a study of the data 
for 1928 and 1929. Chicago: University of Chicago Press, 1930. 

Methods of Procuring and Computing Statistical Information of the Bureau 
of Labor Statistics , Bulletin No. 326, 1923, p. 1. Much of the material contained 
^ the following paragraphs is taken from this bulletin. 



42 


SOCIAL STATISTICS 


The Monthly Labor Review , the official periodical of the Bu- 
reau, has been published since 1915 and is the Bureau’s medium 
for the presentation of reliable information concerning labor in all 
its aspects. The following subjects are given special attention in 
the Review : “Wholesale and retail prices and cost of living; wages 
and hours of labor; productivity and efficiency of labor; minimum 
wage; industrial relations and labor conditions; woman and child 
labor; labor agreements, awards, and decisions; employment and 
unemployment; vocational education; housing; industrial acci- 
dents and hygiene; workmen’s compensation and social insurance; 
labor legislation; decisions of courts relating to labor; labor or- 
ganizations; strikes and lockouts; conciliation and arbitration; 
immigration; cooperation; employees’ representation; welfare 
work; profit sharing; etc.” 12 Several of these subjects obviously 
will not receive statistical treatment in the Review , but many of 
them cannot be discussed without recourse to statistics, and most 
of them involve statistical considerations at some point. Articles 
which present statistical tables deal with the following subjects: 
changes in membership in unions connected with the construction 
industry, transportation unions, mining, oil and lumber unions, 
paper, printing and bookbinding unions, clothing unions, etc., be- 
tween 1926 and 1929; the development of credit unions by states 
and cities; unemployment surveys in several cities; industrial 
accidents; consumers’ cooperation; labor turnover; industrial dis- 
putes; housing; wages and hours of labor; the trend of employ- 
ment; wholesale and retail prices; the cost of living. 13 For the 
student of labor problems and for the research worker there is 
much material of value in this one number of the Review. Other 
numbers cover similar ground but with a varying amount of statis- 
tical material on different topics. 

No other source is so comprehensive in its treatment of wages, 
hours of labor, pay-roll data, and labor turnover as are the 
Monthly Labor Review and certain special reports of the Bureau. 
For example, here is found authoritative information about in- 
creases or decreases in wage rates. Prominent politicians have been 

known to assure the public, in the depression of 1929 , that 

wage rates were being maintained, when the monthly reports of 
the Bureau showed a strong tendency for employers to reduce 
them. Changes in hours of labor are recorded at length. The 

12 Op . cit p. 52. 

18 Monthly Labor Review, February, 1930, Vol. 30, No. 2. 



INTRODUCTION 


43 


indexes of factory and of steam railroad employment show the 
national tendency toward depression or prosperity, or they reflect 
the displacement of men by machines. A decline in the amount 
of pay-rolls reflects declining employment or reduction of wages. 
Such data and indexes have a wide range of uses in the study of 
social problems. 

Closely allied to wage rates and pay-rolls, and also related to 
wholesale and retail prices, is the cost of living. The index of the 
cost of living published by the Bureau of Labor Statistics is con- 
structed on the basis of several hundred items entering into the 
family budget. The year 1913 is taken as ioo per cent, and 
indexes of subsequent years are expressed as percentages of the 
average cost of living in that year. If wage rates have increased, 
have they gone up as fast or faster than the cost of living? The 
index of the cost of living makes this kind of comparison possible. 
Prices of some articles of consumption change more rapidly than 
others. In order to show which items in the family budget are 
lowering or raising the cost of living, index numbers of the cost 
of separate items — food, clothing, rent, fuel and light, etc. — have 
been computed and are published currently. Index numbers of 
retail and wholesale prices are also published. While these are 
similar to the index for the cost of living, they are not identical, 
since the items entering into the latter index are weighted in 
accordance with their importance in the family budget. 

Industrial accidents play an important part in many social prob- 
lems beyond the injury to the worker and his temporary loss of 
wages. Workmen’s compensation laws have helped to relieve the 
immediate economic distress of the worker’s family, but they only 
help. Permanent partial disability leaves him with less earning 
power, and total disability removes him entirely as a source of 
income to his dependents. Dependency of his family may result 
with a long array of attendant evils. The Bureau of Labor Statis- 
tics publishes a summary of statistics of industrial accidents which 
are gathered by the National Safety Council (Chicago) ; it also 
shows comparative rates for different industries and for the same 
industry in different years. A comprehensive study of dependency 
and charitable relief has to take into account the permanent effects 
of industrial accidents, and this fact alone gives the Bureau reports 
added importance for students of social statistics. 

The United States Public Health Service is another source of 
important statistics. Ill health has many ramifications which ap- 



44 


SOCIAL STATISTICS 


pear as complicating factors in a variety of social problems. The 
collection of morbidity statistics, however, has lagged far behind 
the development of statistics of births and deaths. It is a simple 
matter to record a birth, and it is equally simple to record a death, 
though more difficulty may be encountered in stating the cause of 
death. When the whole range of morbidity is considered, it is not 
surprising that statistical reporting lags. More advance has been 
made in reporting communicable diseases than others. Such diseases 
as diphtheria, measles, smallpox, and tuberculosis are more easily 
diagnosed than some others, and the public has a very real interest 
in knowing the time and location of cases. But some communicable 
diseases, like gonorrhea and syphilis, are not adequately reported, 
because most physicians still regard such information about their 
private patients as confidential and refuse to report it unless com- 
pelled by law. Cases of venereal diseases in public hospitals are 
likely to be reported, but this group constitutes a small proportion 
of all such cases. Nevertheless, some progress is being made in 
reporting morbidity. The United States Public Health Service, 
through its Division of Sanitary Reports and Statistics, is attempt- 
ing to systematize the national reporting of various diseases, par- 
ticularly communicable diseases. Since this service is organized all 
over the world, special precautions may be taken if yellow fever, 
cholera, or other similar diseases appear in a foreign port with 
which Americans have frequent contact. “The collection and dis- 
semination of information concerning the prevalence of disease is 
of increasing importance in this age of speedy transportation facil- 
ities. For instance, it is possible that a person infected with typhoid 
fever may, even by motor, traverse the entire width of the country 
before completion of the incubation period of this disease .” 14 In 
order to acquaint the public with the location and prevalence of 
reportable diseases, the Public Health Service publishes a weekly 
report for general circulation. This is of primary importance to 
public health officials and to others associated in some way with 
public health work, but the data published are frequently useful 
to research workers in other fields who require health data for 
problems under investigation. 

As stated, the Public Health Service extends throughout the 
world. “. . . every consul and consular officer stationed abroad 
makes a weekly report to the Public Health Service as a part of 

14 Public Health Reports, Vol. 46, No. 6, p. 285. These reports are issued 
weekly by the United States Public Health Service. 



INTRODUCTION 


45 


his routine duties. The reports are made on forms provided by 
the Public Health Service and bearing a list of the more important 
communicable diseases. The consular officer obtains reports from 
health officials of the country to which he is accredited, and from 
these reports and such other sources as are available he fills in the 
information required on the form and mails it to the Public 
Health Service. These reports by mail cover the following dis- 
eases: Cerebrospinal meningitis (epidemic) ; cholera, Asiatic ; 
cholera nostras, cholerine, or gastroenteritis ; diphtheria ; measles; 
plague, human; plague, rodent; poliomyelitis (acute anterior 
poliomyelitis or infantile paralysis); scarlet fever; smallpox; tu- 
berculosis; typhoid fever (enteric fever, typhus abdominalis) ; 
typhus fever (typhus exanthematicus) ; and yellow fever.” 15 “In 
the domestic field the Public Health Service is kept informed of 
conditions by weekly reports mailed in from local health officials 
in 570 cities of 10,000 or more population. These reports cover 
the prevalence for their respective territories of the following 
diseases: Chicken pox, diphtheria (carriers not included), influenza, 
measles, mumps, pneumonia (all forms), scarlet fever, smallpox, 
tuberculosis (all forms), typhoid fever, whooping cough, cerebro- 
spinal fever, dengue, lethargic encephalitis, pellagra, poliomyelitis 
(infantile paralysis), rabies (in man) (developed cases), rabies 
(in animals), typhus fever.’” G The second half of the weekly re- 
ports gives statistics of these diseases in two parts: first, the United 
States, and, second, foreign nations and the island possessions of 
the United States. 

Besides the weekly reports of morbidity, the Public Health 
Service makes special surveys of public health work, one of the 
most recent of which was a study of this work in Oklahoma. 17 
This report covers a study of the law creating the State Board of 
Health, the administration of the department, the organized 
medical profession in Oklahoma, the state educational authorities, 
and unofficial health agencies; and it mentions two outstanding 
defects: “i. The failure to do any more than scratch the surface 
in the most important field of public health, viz., the hygiene of 
the preschool child. 2. The lack of properly organized local health 
units to apply, locally, the policies of the State Health Depart- 
ment.” 18 The report is largely nonstatistical, but it is based upon 

16 ibid. 

10 Ibid., p. 286. 

17 Public Health Reports, March 13, 1931, Vol. 46, No. 11, pp. 575-598. 

™ Ibid., p. 57 7. 



SOCIAL STATISTICS 


46 

the collection of facts which it seemed unnecessary to present in 
detail. 

Other departments and bureaus of the national government 
publish statistics useful to the social statistician, but the work of 
those described above constitutes the most important sources. In 
special research work economic and technical statistics may be de- 
sirable, and these can be obtained by applying to the proper de- 
partment or bureau. If it is not known what office publishes them, 
recourse may be had to Dr. Schmeckebier’s book, The Statistical 
Work of the National Government , which gives a brief descrip- 
tion of all types of statistical work done and states where the 
various reports may be obtained. 

3. SOCIAL STATISTICS OF STATES 

Many statistical reports are issued by departments of the state 
governments. Those of most interest as social statistics are the 
annual, biennial, and quadrennial reports of state boards of ad- 
ministration, control, charities and corrections, and public or social 
welfare. The Russell Sage Foundation 10 has listed forty states and 
the District of Columbia as having some kind of boards which 
concern themselves with what is commonly called public welfare 
work. Some of these boards merely supervise the financial affairs 
of state institutions, while others collect statistics and act in an 
advisory relation to these institutions, and still others are the cen- 
tral administrative body for them. All these boards publish statis- 
tical reports. For some purposes these reports are more useful in 
teaching research methods than the summaries covering similar 
subjects published by the national government, because the facts 
are given in greater detail. We shall give a brief description of the 
social statistics published by a few of these state departments. 

One of the oldest is the Department of Public Welfare of 
Massachusetts. It was organized in 1863 as the Board of State 
Charities, and its long history makes its annual reports of great 
value in the study of the development of public welfare work in 
Massachusetts. The report is divided into sections for aid and re- 
lief, child guardianship, and juvenile training, under the direction 
of which come such services as indoor and outdoor poor relief, 
mothers’ aid, care of handicapped children, of dependent and 
neglected children, and of delinquent children in institutions. An 

19 Directory of State Boards, bulletin of the Russell Sage Foundation Library, 
No. 96, August, 1929. 



INTRODUCTION 


47 


annual report made of these various kinds of work shows both the 
number given service and the cost of the service to the state. The 
most adequate table is that of statistics of public poor relief. 20 
The Department also supervises private charitable corporations 
and publishes statistics of the volume and cost of their work. 
Statistics of adult delinquents and criminals and of insanity and 
mental deficiency are not published by this Department, but by 
special commissions created for these types of work. 

The Indiana Board of State Charities was organized in 1889. 
At the time of its organization it was modeled to a considerable 
extent upon the corresponding department in Massachusetts, but 
it differs in some important respects. It has advisory authority only 
in so far as the conduct of public welfare work is concerned, with 
one exception: that boarding homes for children are licensed and 
subject to inspection by officers of the Board. Its only direct social 
work is done in placing and supervising dependent and neglected 
children in free homes. Its major function probably is the collec- 
tion of statistics from institutions and agencies, which are required 
by law to report to it at stated times concerning the work of the 
quarter or the year. The Board publishes a monthly bulletin, a 
large part of which consists of statistics, and the number of the 
bulletin which gives the annual report is almost entirely statistical. 
Statistics of crime and delinquency, mental disease and deficiency, 
dependent and neglected children, county general and tuberculosis 
hospitals, and indoor and outdoor poor relief appear in this num- 
ber. Comparative statistics for a number of years are usually given, 
and occasionally the annual report gives statistics by years as far 
back as they have been reported to the Board. The tables are well- 
prepared and intelligible. The one giving the reports of the 
county poor asylums distributes the population in these institutions 
by sex under the following headings: feeble-minded, insane, epi- 
leptic, paralytic and crippled, deaf, blind, senile, sick, able-bodied, 
total population at the end of the fiscal year, and total admissions 
during the year by counties. This kind of an analysis makes the 
report particularly useful for student statistical analysis, because it 
discriminates the different types of persons requiring indoor poor 
relief. Another table gives the distribution by age and sex. The 
table presenting statistics of outdoor poor relief gives comparative 
data back to 1890 for some of the following headings: number of 

20 Annual Report , Department of Public Welfare of Massachusetts, November 

30, 19*8. 



4 * 


SOCIAL STATISTICS 


families and of single persons ; number of males and females; and 
the amount of outdoor relief each year for the state as a whole. 

The Illinois Department of Public Welfare, organized in 1917, 
publishes an annual report containing elaborate statistics of insti- 
tutions and agencies dealing with the insane, the criminal, the de- 
linquent, the dependent and neglected child, the handicapped 
child, the feeble-minded, and the poor. This report is of great 
value for studies in social statistics because of the detail presented, 
particularly in the case of statistics of crime and insanity. The 
statistics on crime are given in tables which show race and na- 
tionality, type of crime committed, age distribution, and educa- 
tional attainments. Statistics of insanity show age, sex, county of 
residence, race or nationality, type of mental disorder, marital 
condition, religion, duration of hospital residence, rate per 100,000 
population, and several other less important facts. The details con- 
cerning the insane are as complete as such things generally are in 
the report of a single hospital for mental disease. 

The annual reports of state departments of public welfare can 
usually be obtained free upon request. They offer a wealth of 
laboratory material for classes in social statistics, and some of them, 
such as the Indiana analysis of people receiving poor relief and 
the Illinois analysis of those in prisons and hospitals for the insane, 
may be used for more extensive statistical research. 

4. STATISTICS OF PRIVATE ORGANIZATIONS 

During the last twenty-five years there have come into existence 
a large number of private organizations, a part or all of whose 
work is social research. Some of them collect general statistics 
dealing with a wide range of subjects, but most of them use their 
statistics as the data for particular research projects in which they 
are interested. The publications of these organizations are of in- 
terest to students of social statistics from two points of view: first, 
the completed research project adds something to the accumulating 
knowledge of social institutions and social organization by the 
application of scientific methods; second, the statistics published as 
such, and not as finished research, are useful for further analysis 
and for their relation to other series of statistics. A short account 
of the work of several of these organizations will be given for the 
purpose of indicating the type of work the student will find on 
examining their lists of publications. 

One of the oldest and best known of these organizations is the 



INTRODUCTION 


49 


Russell Sage Foundation, incorporated in 1907. The Foundation 
has as its purpose “the improvement of social and living conditions 
in the United States of America.” The means of achieving this 
purpose have been largely social research and the publication of 
the results. 21 One of the most recent publications of the Founda- 
tion is A Bibliography of Social Surveys , which lists upward of 
2,700 reports of surveys published up to January 1, 1928. Students 
doing research can consult the classified lists in this book and find 
out where and when other studies similar to their own have been 
made. If this is done and the reports examined, this book will 
contribute measurably to the scholarly character of research bear- 
ing upon social work. Another book, Employment Statistics for the 
United States , edited by Ralph G. Hurlin and William A. Ber- 
ridge, presents a definite plan for the collection of employment 
and pay-roll statistics and suggests uses for them. It was worked 
out by the Committee on Government Labor Statistics of the 
American Statistical Association and represents a thoroughly 
competent judgment on the methods of collecting employment 
statistics. Occasionally a monograph is published, such as The 
Longshoremen by Charles B. Barnes. This is a study of working 
conditions, with special reference to the effects of seasonal varia- 
tions in the employment of longshoremen. It is not entirely 
statistical, but some parts of it are based upon statistics of employ- 
ment and earnings of this group of workingmen. The Social 
Workers y Guide to the Serial Publications of Representative Social 
Agencies by Elsie M. Rushmore provides a check list of the 
publications of over 4,000 institutions and organizations, and is 
another very useful index of sources. Many other books and 
pamphlets have been published by the Foundation, but these will 
suffice to indicate the importance of its work for research workers 
and social administrators. 

Besides the publication of occasional books and pamphlets, the 
Department of Statistics of the Foundation began the collection of 
monthly statistics from relief agencies in a number of large Ameri- 
can cities; in February, 1932, 62 cities reported. Twenty-nine of 
these cities are represented by all the important relief-giving 
agencies; the reports for the others cover only a part of their 
agencies. These statistics are compiled monthly, published, and 
mailed to the cooperating agencies and other organizations and 

21 For a complete list of the publications of the Foundation see A Catalogue 
of Publications , issued by the Foundation in 1930. 



50 


SOCIAL STATISTICS 


individuals who have arranged to get them. This project was 
started in 1926, about two years before the Registration of Social 
Statistics which the Children’s Bureau 22 is now conducting. The 
statistics collected by the Bureau cover all types of social agencies 
in the cities from which they get reports, whereas the Foundation 
is collecting only relief statistics, mainly those of private relief 
agencies, though some public agencies are included. The raw data 
thus collected has two uses: discovering trends in relief and cal- 
culating a seasonal index of relief. The material can be used to 
advantage for laboratory purposes in teaching social statistics. 

The only organization engaged exclusively in population re- 
search, as opposed to the mere collection of population statistics, 
is the Scripps Foundation for Population Research operated in 
connection with Miami University. Its work is based largely upon 
statistics gathered from all parts of the world. Its purpose is not 
to publish statistics fer se , although in its publications a good deal 
of statistical material is given in tables which may be useful to 
other students in their own work. Two examples will show the 
type of work done by the staff. 23 Mr. Whelpton’s object, as the 
title of his article suggests, is to estimate the growth of the popu- 
lation of the United States for the next fifty years. Beginning with 
the population as shown by the census, he uses birth rates, death 
rates, immigration statistics, and those of rural-urban migration 
for his estimates. The results of these estimates are given in tables, 
and a full explanation of the method accompanies their descrip- 
tion. Mr. Thompson is concerned with the effects of the changing 
rates of growth of national populations upon the control of the 
land area of the earth. He gives tables showing the age distribu- 
tion, birth rates, death rates, and natural-increase rates of most 
of the principal nations of the earth, and points out that a struggle 
among nations for control of the land is likely to come because 
of the differential rates of population increase. Such an analysis 
and such data as are presented in this article may have a bearing 
upon many social problems now being studied. 

The National Bureau of Economic Research is another organiza- 
tion whose purpose is research — statistical and otherwise, but par- 
ticularly statistical. The Bureau has been especially interested in 

22 See p. 49. 

28 “Population of the United States, 1925 to 1975,” by P. K. Whelpton, Ameri- 
can Journal of Sociology, Vol. XXXIV, No. 2, pp. 253-270. “Population,” by 
Warren S. Thompson, American Journal of Sociology , Vol. XXXIV, No. 6, 
pp. 959 - 975 * 



INTRODUCTION 


5i 

business cycles, the distribution of income in the population, and 
variations in employment. Two publications, Trends in Philan- 
thropy y by Willford I. King, and Corporation Contributions to 
Organized Community Welfare Services , by Pierce Williams and 
Frederick E. Croxton, deal with problems of particular concern 
to social welfare agencies. Its publications dealing with variations 
in employment are especially important for the social administrator 
whose volume of work declines in times of high employment and 
rises during low employment. The income studies reveal the fact 
that 98 per cent of the population receives about 85 per cent of 
the annual income, whereas 2 per cent of the population receives 
about 15 per cent of the annual income. From the lowest income 
classes arise many of the social problems with which agencies have 
to deal. 24 Elaborate tables on income are given in Leven and 
King’s book, as well as in other publications of the Bureau. The 
publications are useful as secondary statistical sources as well as for 
the conclusions derived from the analysis of data. 

Another organization chiefly interested in carrying on research 
projects, but which has published some statistics as such, is the 
Institute of Social and Religious Research. It was organized after 
the collapse of the Interchurch World Movement to salvage the 
material collected by this organization, and it has gradually de- 
veloped upon broader lines. Its research has to a considerable 
extent been concerned with rural social questions. Several of its 
publications deal with the rural church, and a few have been re- 
ports of surveys of urban churches. One, Middletown > by Robert 
S. and Helen M. Lynd, has received the widest attention; this is 
an experimental application of the methods of social anthropology 
in a case study of a small industrial city. In all the publications 
of the Institute statistical material has been used extensively, and 
some of it is presented in such form that it may be utilized in con- 
nection with studies by other workers. The publication most useful 
as a statistical resource is American Villages , by C. Luther Fry, 
which is composed of population statistics of small towns. These 
statistics are not available in any of the publications of the Bureau 
of the Census. A special tabulation of certain data in the files of 
the Census Bureau was necessary to get the material for this book, 
and for this reason it is a particularly important source for a cer- 
tain kind of population data for the social statistician. 

84 Income in the Various States, by Maurice Leven and Willford I. King, 
1925. See p. 29iff. for the above estimates. 



5 * 


SOCIAL STATISTICS 


There is frequent need of different kinds of index numbers in 
the study of social statistics, and the best source for all kinds of 
index numbers is probably the Standard Statistical Bulletin pub- 
lished monthly by the Standard Trade and Securities Service. The 
table of contents in the 1930-31 edition contains 326 classifications 
of economic statistics and economic indexes, under each of which 
are from 1 to 98 subdivisions. Standard republishes in its Bulletin 
the principal index numbers of production, sales, and prices which 
are made by all other economic statistical organizations. The social 
statistician is not often called upon to compute his own economic 
indexes. It may be necessary in carrying on a particular kind of 
research, but he can usually find a suitable index which has been 
computed by specialists. Therefore, a few of the indexes he is most 
likely to use will be presented and their uses suggested. 

Probably more frequent use is made of price indexes than of 
any other type, for in many investigations the social statistician is 
concerned with costs over a period of time. Since the purchasing 
power of the dollar is continually changing, it is necessary to 
reduce actual expenditures to comparable dollars. For example, 
when a state institution or department or a federal department is 
seeking increased appropriations to carry on its work, it is impor- 
tant to show the legislators that what is asked for is partly to 
maintain the same standard of work in a period of rising prices 
that was formerly attained with less money. Conversely, when 
prices are falling, the legislature may insist upon reducing aggre- 
gate appropriations because as time passes the dollar has greater 
purchasing power. One of the price indexes published by Standard 
is the Index of General Prices constructed by the Federal Reserve 
Bank of New York. According to this index, $1.00 in 1913 would 
buy as much as $1.79 in 1929. Thus, if a state department re- 
ceived $1,000,000 in 1913, it would need $1,790,000 in 1929 to 
maintain the same standard of work, assuming the degree of 
efficiency to be the same at both periods. In 1929 a dollar bought 
only a little over half as much labor, wholesale and retail com- 
modities, and rent as it did in 1913; its purchasing power had 
decreased markedly. After the onset of the 1929 business de- 

pression prices began to decline more rapidly $ that is, the purchas- 
ing power of the dollar rose. Legislatures in 1931 reduced many 
appropriations from the level of the preceding biennium, not, 
however, because prices were falling but because the public de- 
manded reduced taxes. If prices have at present fallen sufficiently, 



INTRODUCTION 


53 


it may be that little or no retrenchment is necessary on the part 
of state institutions to keep their work up to the standard of the 
last two years. 

An index of the cost of living is sometimes useful in social 
statistics, and the one most widely used is that published by the 
United States Bureau of Labor Statistics and republished by Stand- 
ard. An index of the cost of living differs from an index of whole- 
sale or retail prices, or the general price level. It is nearest to an 
index of retail prices, but usually the weighting scheme is different 
and that affects the index as finally determined. The cost-of-living 
index is computed from retail prices of goods bought by families 
for consumption, and the quantities consumed are weighted by the 
relative importance of the commodity in the family budget. It is 
useful to relief agencies, which intend to maintain a minimum 
standard of physical efficiency in their clients, when the size of an 
allowance is determined. In a number of cities the relief agencies 
have studied local retail prices and have made up their own 
budgets on the basis of the cost of living, but in most localities this 
has not been done. The index of the cost of living as computed 
by the Bureau of Labor Statistics might be used to advantage by 
small agencies and public poor relief administrators to determine 
the amount of money required to purchase food, clothing, shelter, 
fuel, etc., sufficient to maintain physical efficiency. Enlightened 
employers might even use it as one factor in setting the minimum 
wage level. 

Indexes of employment, wages, and general production are of 
less obvious use to social administrators and statisticians than those 
of prices and of cost of living, but they may be needed occasion- 
ally. Wage indexes are important in all social problems touching 
on the standard of living. For instance, if wages go up at the same 
rate as the cost of living, the standard of living is being main- 
tained but not improved; if the cost of living rises faster than 
wages, then the standard of living is being reduced. Because the 
volume of employment responds quickly to changes in production, 
employment and production indexes may be useful in many ways, 
though not so obviously as certain other indexes. Declining pro- 
duction is soon followed by declining employment, and declining 
employment results within two or three weeks in a rise in the 
number of families receiving charitable relief. Thus, the close 
study of the trend of production indexes would be a general guide 
to social agencies in looking ahead, if not in making definite plans. 



54 


SOCIAL STATISTICS 


Crimes against property can be forecast in some degree by the use 
of a general business index. When a depression is seen coming, it 
might be a good time for police departments to add a few men 
to the force and to cover more carefully the areas in which crime is 
most prevalent. Other uses could be found for such indexes. 

5. STATISTICS OF INDIVIDUAL AGENCIES AND INSTITUTIONS 

Most social agencies and institutions make annual reports of 
some kind, and some of them publish statistical material at irregu- 
lar intervals. The annual reports of state institutions are likely to 
contain tables which may be of considerable value to students of 
social problems in their own states, because they generally give 
more detailed statistics than a central collecting agency publishes. 
This is notably true of the reports of hospitals and prisons in some 
states. The further removed a statistical report is from the agency 
or institution which made the original records, the less detail there 
will probably be in it; and because this is generally true, it is de- 
sirable for students to have access to some of the reports put out 
by the institution or agency which made the original records. 

The Cleveland Health Council is an example of a social agency 
which publishes statistical studies at irregular intervals. The Coun- 
cil is the research and coordinating agency of all the principal 
health agencies in Cleveland. These agencies require many studies 
of public health matters. The Council has published studies of 
density and fluctuation of population in different parts of the city, 
distribution of inhabitants by country of birth, distribution of cases 
of influenza from December 1, 1928, to January 31, 1929, and of 
mumps from August 1, 1926, to July 31, 1927, distribution of 
families served by family case-work agencies in 1928, and other 
special studies. 25 The population returns of Cleveland, largely 
through the efforts of Mr. Howard Whipple Green, Director of 
Research and Secretary of the Health Council, have been tabu- 
lated by census tracts for 1910, 1920, and 1930, and the Health 
Council publishes these data with a street index. The Council also 
publishes an occasional supplement in intercensal years, giving 
estimates of the population by tracts. All this material has a variety 
of practical uses. 

The Institute for Juvenile Research of Chicago has published 

“For maps showing the above studies, see “Facts, Figures and Fiction in 
Social and Health Statistics,” by Howard Whipple Green, in New England 
Journal of Medicine , Vol. 202, No. 16, April 17, 1930, pp. 771-77S. 



INTRODUCTION 


55 


some highly important studies of the distribution of crime and 
delinquency in that city. The plan of these studies has to a large 
extent followed the theory of “human ecology,” that is, the study 
of the distribution of groups of the population and the forces which 
caused these groups to segregate and to survive as they have. The 
most interesting of these studies from the point of view of the 
social statistician is Delinquency Areas by Clifford R. Shaw and 
his associates . 26 Maps show the utilization of the land covered by 
the city, and the residence of each delinquent is indicated by a dot 
on the maps. This device shows the points of concentration of 
delinquency, and makes clear the relation of delinquency areas to 
the railroads and industries. Delinquency and crime are thus shown 
to be heavily concentrated in the downtown sections of the city 
and around the stockyards, but the concentration decreases rapidly 
as one gets away from the Loop District. Thus conditions which 
favor the residence of criminals and delinquents are emphasized 
by this method of study. Students of crime statistics will find this 
and other studies of the Institute suggestive for their own work. 

The Research Bureau of the New York Welfare Council at 
irregular intervals publishes statistics which are useful to students 
in New York and elsewhere. These statistics usually relate to 
specific social problems in Greater New York, but often they have 
an important bearing upon general problems concerning other 
cities as well. The Council has recently published A Guide to Statis- 
tics of Social Welfare in New York City , by Florence Dubois, 
listing hundreds of studies of social problems made in New York, 
and making a knowledge of this work easily accessible to anyone 
waiting to consult one or more of these studies. 

Social Science Abstracts , a monthly journal of abstracted maga- 
zine articles on social science and social work, has in each issue a 
section on “Research Methods,” and one of the divisions of this 
section is devoted to statistical methods. Many of the articles are 
statistical studies of social problems with which every social worker 
and social investigator is concerned. At the end of each year an 
index number of Abstracts is published, listing by subject and 
author all the abstracted articles appearing during the year. This 
journal should be consulted at an early stage in any study, to see 
what has already been done on the problem under consideration. 

A number of the larger cities of this country have created city 

** Delinquency Areas , Clifford R. Shaw, et al. University of Chicago Press, 



SOCIAL STATISTICS 


56 

census committees whose purpose is to get the census returns 
tabulated by tracts and then to publish this material in a form 
available for administrative and research uses. The census tract is 
a small area, varying in size in the same city and in different cities, 
laid out with a view to enclosing a small homogeneous population 
— one similar as to race, nationality, economic status, etc. The New 
York Census Committee, the first of these committees, published 
the returns of the 1910 and 1920 censuses by sanitary districts 
which were accepted as adequate census tracts. The volume con- 
taining the 1920 New York census data distributed by census tracts 
was as large as the population volume issued by the Bureau of the 
Census for the whole country. The census-tract plan makes possible 
highly dependable statistical studies of social and health problems 
in large cities, and 15 of our larger cities used this plan in 1930. 
The published volumes, which can usually be bought from the 
census committees in the respective cities, furnish a basis for a 
great deal of social research, and provide data that is of ines- 
timable value for teaching purposes. Because of their detail, these 
volumes are more useful for teaching and for intensive research 
in particular cities than is the material published by the Bureau of 
the Census. 


6. STATISTICAL ORGANIZATION 

An examination of the history of any organization whose object 
is the collection or analysis, or both, of statistics will reveal a 
growth. The organization frequently was created for a specific 
purpose and became a general collecting agency, or it started out 
to do one thing and developed over a period of years so that it 
does many others not conceived in its initial stages. For example, 
as has been said, the United States census began as an enumeration 
of the population for the purpose of apportioning representatives 
in Congress. For no years the census organization was set up 
every ten years, and when the collection, tabulation, and publica- 
tion of the returns were completed it was disbanded. However, 
additional items were gradually added to the enumeration sched- 
ule, and in 1900 the organization was made permanent and estab- 
lished as the Bureau of the Census. At first only decennial data 
were collected 3 now the Bureau collects several kinds of data 
which are published monthly or annually. The Institute of Social 
and Religious Research has extended its work and has included 
in its scope many studies not even thought of by the original 



INTRODUCTION 


57 

organization. However, allowing for the certainty of change in any 
statistical organization as time passes, some characteristics are 
common to most of them, and these may be set forth briefly. 

Three kinds of statistical organization may be distinguished 
roughly by the purpose of collecting the statistics: (i) statistics as 
bookkeeping; (2) statistics for general use; and (3) statistics for 
special research. 

A small agency engaged in some kind of social work keeps 
certain records of its activities. It has no intention of making any 
systematic statistical analysis, and the volume of data accumulated 
probably does not warrant such a plan. Its aim is to do social book- 
keeping for its own purposes, and little formal organization is 
required. The administrative head or a staff worker may record 
information on regular forms or may keep the data in a sort of 
day-book. At the end of the year or at some other interval when 
a summary report is needed, the same individual may add up the 
cases, classify them according to certain attributes, and present 
them in written form. A larger agency or institution, like a state 
hospital or a prison or a family relief society in a large city, may 
have no intention of publishing statistics for general use or en- 
gaging in extensive statistical analysis. In other words, it may 
merely do social bookkeeping on a larger scale than the small 
agency. Where 1,000 or more individuals or families are dealt 
with in a year, some statistical organization is necessary, if the 
bookkeeping is to be of any use in showing the volume of work 
or in influencing policies. The day-book plan of recording is useless 
for a quantitative analysis of the work of the agency or institution, 
though as a matter of fact it is employed by public poor relief 
officials quite generally because they are concerned with individual 
cases and not with quantitative summaries of the work done. If 
any constructive use is to be made of the bookkeeping, regular 
forms for recording data must be devised and carefully filled out. 
The items to be recorded need to be defined carefully so that one 
record will be comparable with another. One or more clerks whose 
principal work is to keep the records and file them systematically 
are needed, and they should know enough, or be taught enough 
on the job, to enable them to tabulate the data in summary form. 
The data are then ready for further analysis and study, which 
may be done by the administrator or by a staff worker who has 
had statistical training. The most important step in this simple 
organization is preparing the record forms and defining the items 



SOCIAL STATISTICS 


58 

to be recorded. For several years the Russell Sage Foundation 
has been trying to devise a satisfactory statistical card for family 
case-work agencies, but so difficult is the problem that the card 
has been revised several times. Most demographic items can be 
defined rather easily, but those relating to the work of the agency 
or institution are often difficult of accurate definition, and they are 
the ones in which the agency or institution is most interested. 
When this step in statistical organization is satisfactorily taken, 
the others follow with ease. 

Organizations which publish statistics for general use — public 
departments and bureaus and certain privately supported bureaus 
— must be more elaborately organized. A department of public 
welfare which collects statistics of the institutions and agencies in 
a state has to have a variety of forms. For example, if there are 
six state hospitals for mental diseases, it is important that all of 
them use the same form so that their returns will be comparable j 
fortunately a standard form for such institutions is widely used 
throughout the country, which simplifies the work of a state de- 
partment. But the state department must devise forms for reports 
from poor asylums, township or county poor relief officials, and 
others for whom no standard statistical form exists. The persons 
who handle these forms have little or no training in record-keeping 
or in summary statistical reporting, and frequently they are not 
very intelligent ; all this adds to the difficulties of the depart- 
ments. Statistical clerks in the departmental offices must be better 
trained and able to detect and follow up errors in reports until they 
are corrected. Such a department ought to have a full-time statis- 
tician to supervise the collection of the material and tabulate and 
.analyze it, but some of them do not have such a specialist. Much 
time would be saved and accuracy increased, if the department 
used punching and sorting machines, for these save labor, reduce 
errors in tabulation, and enable it to make much more worth-while 
analyses of the data collected. The Bureau of the Census at Wash- 
ington has the largest statistical job of any organization in the 
country, since it tabulates not only the data for 12 2, 000, OCX) indi- 
viduals in the population census but hundreds of other series of 
data. It now uses 166 of these machines. The larger the organiza- 
tion, the more professional statisticians it requires, even though 
the data are not analyzed and presented as special studies. 

The organization is somewhat different when the purpose is 
special social research. New forms must be devised for each sepa- 



INTRODUCTION 


59 


rate study. A basic staff of clerks and statisticians is necessary, but 
experts who are familiar with the special field of investigation and 
who have had statistical training are of vital importance. It is a 
common opinion among workers in the social sciences that a good 
research man must know his subject first, and then know statis- 
tical methods. If it is impossible to find some one possessing both 
these qualifications, it is better to select one who knows his field, 
letting him learn statistical methods as opportunity permits. How- 
ever, with the increasing emphasis upon statistics in social work 
and the social sciences it is usually not difficult to find a specialist 
who also possesses statistical training. The same set-up of me- 
chanical equipment is necessary in a research organization as in 
an executive department or bureau, and whatever other equipment 
and personnel will add to efficiency should be obtained. When 
funds restrict the scope of the work, as they usually do, it is better 
to limit the work undertaken and provide adequately whatever is 
necessary to do it well. 



CHAPTER III 


The Nature of Statistical Research 


Statistical method is only one of the methods by which social 
research is carried on. The historical method may utilize statistical 
data or it may not. In the past history has been largely depictive ; 
it has given a picture of a period of social life and has shown in 
general the sequence of events. This has been the work of a literary 
artist who had a flair for discovering facts and weaving them into 
a coherent pattern. In more recent years, however, some historical 
writing has made considerable use of statistical material, notably 
in economic and social history. Effort has been made to write the 
recent history of social and economic institutions by assembling 
data in a sufficient quantity and of such homogeneity that the 
depictive method would not be of great importance. Philosophical 
social research usually relies upon history for its data and is merely 
a special form of history, or it has attempted to generalize from 
impressions made in the study of a wide variety of contemporary 
facts. Philosophical research has an important place in the social 
sciences as a critical tool; particularly is the criticism of the logic 
of the social sciences an important function of this kind of research. 
But the method of research which is more often contrasted with 
statistics is case study. Most historical writing is based upon case 
study, and the generalizations which have proceeded from philos- 
ophy have on their factual side their basis in assumptions about 
cases. Qualitative and quantitative research are frequently set out 
as combatants in the field of social research, but that is a mistaken 
conception. Professor Wesley C. Mitchell said in his presidential 
address to the American Economic Association in 1924, “In the 
thinking of competent workers, the two types of analysis will 
cooperate with and complement each other as peacefully in eco- 
nomics as they do in chemistry.” 1 In this respect as much may be 

1 Mitchell, W. C., “Quantitative Analysis in Economics,” American Economic 
Review, Vol. XV, No. i, p. ia. 


60 



INTRODUCTION 


61 


said for the value of qualitative and quantitative methods in social 
work and in the other social sciences. All valid methods of re- 
search applicable to social data will be found useful in the social 
sciences. It is the purpose of this chapter to set forth briefly the 
nature of that form of social research which is called statistical 
research. 


I. THE NATURE OF CASE STUDY 

Because in recent years there has been a good deal of controversy 
in the social sciences over the relative merits of case study and 
statistics and because many students of social statistics will also do 
case studies and case work, a more extended discussion of this 
subject is desirable. A case is an individual instance. It may be a 
delinquent boy, a family, a community, or even a nation. What- 
ever it is, it is a single unit of the kind under examination. Upon 
this subject Professor Giddings has illustrated a case in the fol- 
lowing manner: “The case under consideration may be one human 
individual or only an episode in his life j it might conceivably be 
a nation or an empire, or an epoch in history. The cases with which 
social workers are apt to be concerned are individuals, families, 
neighborhoods and communities. The cases in which the ethnolo- 
gists, historians, and statesmen are apt to be interested are non- 
civilized tribes, culture areas, historical epochs and politically 
organized populations. Demographers are concerned with the evo- 
lution and degeneration of populations in respect of their biological 
and psychological quality, and of their vitality .” 2 Some of these 
“cases” will appear unusual to the student. But if the concept of 
the case as a single unit of the kind under study is kept clear, the 
accuracy of Professor Giddings 1 illustrations will be obvious. Sup- 
pose a student of population, known as a demographer, is con- 
cerned with the growth of the population of a country like the 
United States. The United States is a nation, and among all the 
nations of the earth it is a case. It is one of its kind, a single 
national unit with differences that distinguish it more or less 
sharply in population from other nations. Statistical data will be 
used to study this case of population growth, but the statistics are 
data concerning the individual human being, not nations as a class. 
In a statistical study of population growth of all the nations of the 
e arth, each nation would be an item in the series, in other words, 

2 Giddings, Franklin H., The Scientific Study of Human Society, p. 95. Univer- 
sity of North Carolina Press, 1924. 



62 


SOCIAL STATISTICS 


a case. Statistics is, therefore, a method of dealing with a large 
number of cases at once, or it is the method used to consider the 
distribution of a single trait among many cases of a similar kind. 

This is perhaps the essential distinction between case study and 
statistical study. Statistics analyzes the distribution of cases as units 
or as single traits of cases. Case study analyzes the combination of 
all traits in a particular case. If this distinction is valid, and it is 
the one coming to be held by students of the social sciences, there 
can be no controversy over the usefulness of both methods. They 
have different functions as methods of research. A case study re- 
quires as complete a description as possible of all the facts con- 
ceivably pertinent to the case. Some of the facts, like age, will be 
objective and quantitative; others, like attitudes, will be qualita- 
tive. But both kinds are necessary to form a judgment about the 
case. An illustration will make this clear. It is drawn from the 
field of family case work. Here is a family composed of the father, 
mother, three small boys and two girls, the oldest of whom is 
twelve years of age. The man is out of work, the family is out 
of food, and the rent is overdue. They have asked for financial 
relief. Obviously relief is necessary and will be given. But it is 
the business of the case worker to find out why the family is in 
such circumstances and to devise a plan for rehabilitating it. The 
name, age, and sex of each member of the family are obtained. 
The father has no regular occupation but is a laborer and works 
at whatever he finds. For eleven years the family has lived in 
the same tenement of two rooms. Four of them sleep in one bed 
and three in another. The mother seems to be mentally slow, and 
the oldest child, a girl, appears to be mentally retarded. Is this 
the first time the family has had to ask for charitable relief? A 
check through the social service exchange indicates that it probably 
is, because none of the relief agencies in the city has registered it 
previously. One thing is clear; for a number of years the family 
has made its way, and that is a basis of hope for the case worker 
that the family may be restored to self-maintenance. But the ex- 
planation of the present condition ^must be found. The names of 
previous employers of the father are secured ; later they are inter- 
viewed. One for whom he worked nine years explains that he had 
been a steady worker but that a few months before he was dis- 
charged he gradually became too independent and ceased to get 
along with his foreman. The only thing the employer could do 
was to let him go. The man’s jobs after that had been temporary. 



INTRODUCTION 


63 

He had been discharged a few times and had quit several without 
any adequate explanation. Further investigation showed that the 
man had been drinking more in recent years and frequently was 
drunk for several days at a time. The mother and the children 
were examined, and the mother and two children were found to 
be feeble-minded. These were the facts that formed the case 
worker’s basis for making a plan of social treatment. Some of these 
facts are objective and quantitative, but qualitative factors enter 
into the explanation of the man’s behavior. Both kinds of facts are 
important for social case work with this family. It might even be 
helpful to compare this case with some other having similar char- 
acteristics in certain respects. Complete description and comparison 
of case with case are the two fundamental principles of case study, 
and they are the indispensable tools of the case worker. 

The procedures of case study and of statistics have been dis- 
tinguished, but it remains to evaluate case study as a scientific 
method in the social sciences and in social work. First, then, as to 
what case study cannot do: it cannot generalize. There is no valid 
basis in its procedure and in the facts obtained for a generalization 
that would apply to other cases, for the simple reason that the 
other cases to be generalized about have not been included in the 
study. Here the concept of “population,” as it is used in statistics, 
will help to clarify the point. The term population in statistics 
refers not only to human population, but to any kind of objects 
under study. The population in a city might be the total number 
of people, or the total number of business establishments, or the 
total number of children of school age. If we are studying a 
problem that concerns an entire city and it is desirable to draw 
conclusions applicable to the whole city, either all the “population” 
must be included in the study or a representative sample of the 
whole population must be taken. The sampling method is gen- 
erally resorted to, because it is reliable (see Chapter XII), re- 
quires less time, and is less expensive. If the problem concerns a 
particular city, then the objects for study in that city are the 
population. If the problem concerns a state, the objects for study 
are all such objects in the state. If the sampling method is used 
instead of a complete study of all the objects, the sample must be 
selected carefully. A carefully selected sample for study permits 
conclusions which are applicable to the entire population. But 
neither a study of all the objects nor a sample of this population 
permits conclusions about the population of another city or state. 



SOCIAL STATISTICS 


64 

Other cities and states were not sampled or studied as a whole. 
For the same reason a case does not permit conclusions which can 
be applied to other cases, because the single case was the popula- 
tion. The other cases were not studied as a whole, and one case 
is not enough to be called a sample. Conclusions valid for the 
particular case may be drawn, but no generalization can be made 
concerning other cases. Generalization requires the study of a 
large number of cases which are representative of the whole popu- 
lation. In view of the fact that case study involves qualitative 
factors which cannot be reduced to statistical data, the only way a 
large number of cases can be treated statistically as wholes is to 
regard the judgments or conclusions about cases as statistical data. 
This may be done, but it can be done by none but experts, and 
even then the results may be questioned. Hence for practical pur- 
poses we may say that case study does not permit generalization. 
That is a function of statistical study and is limited to generaliza- 
tion from quantitative data. 

But case study may be an important aid to statistics, since every 
statistical study is preceded by case study or accompanied by as- 
sumptions about cases. Such an apparently simple matter as the 
enumeration of the population of the United States assumes a 
knowledge of cases, namely, the human beings who constitute the 
population. They have age, sex, occupation, place of residence, 
etc., which may be enumerated or measured. This is, of course, 
superficial case study, but the knowledge which the Bureau of the 
Census has of these human cases enables the officials to decide that 
certain items, or factors, should be counted in the census. That is 
case knowledge, which comes from a consideration of cases and 
is an aid to factorizing for statistical purposes ; the factors of im- 
portance must be determined and defined. As such, case study 
even when superficial is an important step in all social research 
and particularly in statistical studies. 

It should not be concluded, however, that case study is merely 
the handmaiden of statistics. It has a function secondary to no 
other method in the social sciences* By means of case study con- 
trol over the development of events in individual cases is achieved 
for practical ends of amelioration. The aim of all scientific investi- 
gation is to secure knowledge upon which control may be based 
either for practical or for scientific ends. In this sense case study 
is highly important in social work. In so far as the dependency in 
the case cited above is due to social factors, the study of this case 



INTRODUCTION 


65 

may be expected to lead to control, if biological conditions, such 
as feeble-mindedness, are paramount to all others, then control of 
the conditions leading to dependency in this case may not be 
achieved, though through segregation or sterilization control could 
be achieved of the biological factors in so far as the next genera- 
tion is concerned. Taken in this sense, case study has its own 
independent value as a scientific method . 3 

2. QUANTITATIVE DATA 

Statistics is concerned with quantitative data: the quantitative 
nature of social data may arise in two ways. One way is by count- 
ing, and the other is by measuring . 4 The fundamental difference 
between counting and measurement is discussed below in connec- 
tion with the treatment of continuous and discontinuous variables. 
There may be some dispute as to which social facts can be counted 
or measured, and which cannot be treated in this way. Some quali- 
tative facts, such as attitudes, do not lend themselves easily to 
counting or to measurement, because their definition is not clear 
enough, though some students have tried to measure them by 
means of a scale. It might be questioned whether or not the num- 
ber of insane persons can be counted. Certainly the number in the 
total population never has been counted, but we do commonly 
count those in institutions. Until it is possible to connect insanity 
definitely with neurological changes, it will be a qualitative fact, 
but we shall probably continue to count the insane because of the 
tremendous importance of the problem. The concept of juvenile 
delinquency is also rather vague, even in the statutes, but this 
difficulty is somewhat overcome by classifying as a delinquent 
every child brought into the juvenile court for misbehavior. Age, 
weight, height, population per square mile, income, children en- 
rolled in school, criminals in institutions, families receiving charita- 
ble relief in a city, occupation, marriages, divorces, and many other 
social facts can be counted or measured and are generally accepted 
as quantitative facts. Statistical study of crime, delinquency, and 
insanity is of necessity restricted to the study of those individuals 

8 For a more extended discussion of the validity of case study as a method 
in social science, see “The Relative Value of Case Study and Statistics,” by 
R. Clyde White, The Family, January, 1930, pp. 259-265; also Lundberg, G. A., 
Social Research, Chap. VIII. New York: Longmans, 1929. Also Chapin, F. S., 
“The Problem of Controls in Experimental Sociology,” Journal of Educational 
Sociology, 1931. 

* In this connection see Yule, G. U., An Introduction to the Theory of Statis- 
tics, p. 7. London; Charles Griffin & Co., Ltd., 1924. 



66 


SOCIAL STATISTICS 


who are legally defined as criminal, delinquent, or insane and are 
confined in an institution or otherwise taken under special super- 
vision. Hence we are dealing with court cases rather than with 
crime, delinquency, or insanity in general. Many social facts which 
are essentially qualitative and not amenable to quantitative treat- 
ment are given a quasi-objective status by the fiat of a court, cus- 
tom or other authority, and statistical studies are made of them. 
There can be no objection to this practice, but at all times it should 
be clear what facts are being considered. That is, the “population” 
must be clearly defined in the mind of the worker and in the 
mind of the public which may read a report of his work. 

The kind of quantitative data obtained by counting is known 
as an enumeration of attributes. We attribute some characteristic 
to the individuals (individual does not refer necessarily to a hu- 
man being but to any unit of a class of objects to be studied) being 
studied. The redness of an apple is an attribute, and so would be 
the blue eyes or light-colored hair of a Nordic. The blackness of 
the Negro and the yellowness of the Chinese are attributes. Na- 
tionality is an attribute: English, French, German, Russian. Be- 
cause these people live in a certain political unit and speak a 
certain language, the name of this political unit and the language 
spoken is attributed to the individuals, and English, French, Ger- 
mans, and Russians are counted by the census. Of course, being 
English or Russian implies many other qualitative facts besides 
nativity and language. The census is to a considerable extent an 
enumeration of attributes. The chief problem here is the discovery 
of precise definitions of the attributes. Who, for example, is a 
Negro? Is an octoroon a Negro? He has seven-eighths white blood 
and one-eighth Negro blood. Some Italians and other south Euro- 
peans have more pigment in their skins than octoroons, but they 
are not classified as Negroes. By convention anyone who knows 
he has a Negro ancestor is a Negro. On that basis the census is 
taken. Of course, some people who have a Negro ancestor in the 
distant past do not always return themselves in the census as 
Negroes, because in everyday life thgy pass as white people. Again, 
who is an employed person? If he is raid off for a week at the time 
the census is taken, is he employed? If he is on vacation, is he 
employed? If he works at an illegal occupation, like bootlegging, 
is he occupied? Once more, who is married? Is a man married, 
if he regularly lives with a woman but never bought a marriage 
license or had a marriage ceremony? Such questions indicate the 



INTRODUCTION 


67 

necessity of precise definition so that the attribute will everywhere 
be recognized and, when counted, placed in the same classification. 

Another characteristic of statistical variables is that some of 
them are continuous, and some are discrete or discontinuous, vari- 
ables. In theoretical problems this distinction should be made, 
while in practical work it is not so important, though it ought to be 
understood. “A discrete variable,” says Rietz, “is one whose values 
differ by assigned steps, often by unity; for example, the number 
of children in a family, the number of kernels on an ear of corn. 
A continuous variable is one whose values may differ by amounts 
which are infinitely small; for example, the weight of a man, the 
temperature at a place.” 5 The number of children in a family 
would always be a whole number; there never would be 3*^2 
children. The assigned step, as Rietz calls it, is one child. Upon 
first thought interest rates might be regarded as a continuous 
variable, but they are not so in practice. They are fixed by custom 
in terms of units, halves, quarters, and eighths of per cent. An 
interest rate of 4.247 per cent would never occur. But the weight 
or the age of a man may vary in amounts infinitely small. Weight 
might be expressed in pounds or kilograms, or it might be ex- 
pressed in grains or milligrams and fractions of these small unit 
measures. The continuous variable changes by amounts as small 
as the investigator may wish to use, and a curve representing such 
data could be smoothed with theoretical accuracy. The concept of 
discrete and continuous variables will appear again when frequency 
curves are discussed in Chapter VII and when probability is dis- 
cussed in Chapter XII. 0 

Quantitative data which may be measured are called variables 
by Yule. This concept of the variable is commonly held by statis- 
ticians. Yet in usage there is an exception, when attributes are 
treated as variables. In line graphs (see Chapter VII) the data 
plotted on both the horizontal and vertical scales are spoken of as 

5 Rietz, H. L., in Handbook of Mathematical Statistics, edited by H. L. Rietz, 
p. 20. Cambridge: Houghton Mifflin Co., 1924. 

*The complexity of the problem of variables is too great to enter into a long 
discussion of it here. It is perhaps the fundamental problem in higher mathe- 
matics. For a general discussion of the subject the student is referred to Russell, 
Bertrand, Principles of Mathematics, Vol. I, especially Chaps. I, VIII. Cam- 
bridge University Press, 1903. See also McMillen, A. W., Measurement in 
Social Work, University of Chicago Press, 1930; and Chapin, F. S., “The 
Meaning of Measurement in Sociology,” Proceedings of the American Sociology 
ical Society, 1930, pp. 83-94, for specific applications of the subject to social 
statistics. 



68 


SOCIAL STATISTICS 


variables. Attributes are shown in frequency distributions and are 
plotted just as frequency distributions of variables in Yule’s sense 
are plotted. But with this exception, which has practical justifica- 
tion in technical procedure, the term variable will be used to refer 
to quantitative data which can be measured. Such facts as age, 
time, price, income, physical production, and perhaps levels of 
intelligence are true variables and are expressed in terms of mag- 
nitude. There are others, of course, but these will suffice for illus- 
trative purposes. Time is measured in seconds, minutes, hours, 
days, weeks, months, years and centuries. These are conventional 
divisions, but each bears a definite relation to the other j they are 
measures of time for purposes of convenience and suffice for prac- 
tical purposes. Price and income are expressed in dollars, pounds, 
marks or francs, and each of these national monetary units is 
definite — the changing purchasing power of money is another 
problem. Physical production is measured in pounds, tons, ton- 
miles, etc. Levels of intelligence are expressed in terms of the 
ratio of the chronological age to the mental age. The question may 
be raised whether the devices for measuring mental age are actual 
measuring sticks, but, if it is assumed that they are, then the intelli- 
gence quotient is a true variable. 

Variables may be classified as independent and dependent. This 
means that one series of facts to which another series is related is 
treated as cause, and that the second series changes in accordance 
with the first. Speaking of plotting variables on a graph, Brinton 
says, “One of the variables is used as a standard or measure by 
which to interpret the facts under consideration, and it may be 
called the ‘independent variable . 5 The other variable, which is 
interpreted from the independent variable, is called the ‘depend- 
ent variable . 5 55 And further, “It is difficult to make a general rule 
for determining in any case which is the independent variable and 
which is the dependent variable. The decision depends entirely 
on how any set of data is approached and on the habits of mind 
of the investigator . 557 Considering the way in which the term 
variable is used in mathematics, Brinton’s statement is perhaps 
both too broad and too indefinite. If two variables are functionally 
related, the one regarded as the function of the other is obviously 
the dependent variable. Unemployment is unquestionably related 
to poverty in the case of workmen. It would be possible to measure 

7 Brinton, W. C., Graphic Methods for Presenting Facts , p. 84, McGraw- 
Hill, New York, 1914. 



INTRODUCTION 


69 

the time each man in a group had been unemployed and to set 
up some scale for measuring his degree of poverty. If other factors 
entering into poverty are held constant, we may measure the inter- 
dependence of unemployment and poverty. The amount of unem- 
ployment in each case is the independent variable, and the degree 
of poverty in each case is the dependent variable. Poverty is in 
this sense caused by unemployment. In statistical language Y, 
poverty, would be a function of X, unemployment, and they 
would be used in that way, if the method of correlation is em- 
ployed to measure the degree of interrelationship. Sometimes in 
plotting two series of data it may be desirable to use as Y one 
series which in other connections would be treated as X. But this 
is a practical problem and does not alter the fact that the inde- 
pendent variable is always in a real sense independent and that 
the dependent variable is always in a real sense the function of the 
independent variable. One of the ultimate aims of social statistics 
is to predict events — a goal yet far in the future. When predic- 
tion is the objective, it is the behavior of the dependent variable 
that we want to anticipate. The independent variable is regarded 
as the cause of changes in the dependent variable. 

3. MULTIPLICITY OF FACTORS 

In the discussion of variables the impression may have been 
given that the scientific study of society is simpler than it really 
is. The aim of pure social science is to find causes. Applied social 
science uses this knowledge of causes for the purpose of controlling 
events in the interest of human welfare. Social statistics comes to 
this problem in a different manner from that employed by the 
older social scientists. In the earlier literature of the social sciences 
cause was something fixed and could be discovered like the law 
of gravitation, but careful case studies and statistical analysis have 
led to the discovery of a more perplexing situation and to a hum- 
bler conception of social causation. It should be said also that case 
studies have contributed to this viewpoint. Why do we have a 
business depression? Some decades ago it would have been blamed 
on the government or on the bankers, but the study of business 
cycles has revealed the complex milieu in which a depression 
occurs. Social scientists hold a point of view which is not even 
yet held by the general public. In the middle years of the last 
decade it was announced through the press and from the platform 
that we should never have another depression. Yet one of the 



70 


SOCIAL STATISTICS 


worst depressions in the experience of the American nation occurred 

in 1929 . The social scientists had pointed out the beginnings of 

this depression more than a year before it was generally admitted, 
but even in 1932 they could not agree on an explanation. The 
problem is too complex ; the factors entering into it are many. 
There was overproduction in certain basic lines \ underconsumption 
was cited as a cause 3 other suggestions have been scarcity of gold 
in some countries and oversupply in others, mass production, Rus- 
sian dumping, decay of capitalism, a tariff that is too high, a 
tariff that is too low, etc. Not all of these have been advanced by 
sober economists, but they have been advanced by men in respon- 
sible positions. The one thing clear to social scientists is that nobody 
has a sufficient understanding of all the factors involved to explain 
the cause of this depression. 

A social condition is influenced by a multiplicity of things: social, 
psychological, physical, and biological. Amid such possibilities 
about the best the social statistician can do is to record facts which 
seem to him important and then observe the quantitative changes 
which occur in these facts. The rate of change increases, and the 
number of cultural factors grows. That is not as simple as it sounds, 
because sometimes there is doubt as to what are the facts, and 
particularly what are the significant facts. That, of course, is a 
problem common to the natural sciences also, but apparently it 
applies with greater force to the social sciences. We count the 
criminals in our institutions. They are studied by isolating various 
objective factors like age, sex, occupation, education, place of resi- 
dence, and previous criminal record. These facts are relatively 
easy to obtain. But are they the significant facts? If they are, what 
is the significant relationship existing among them? If not, are the 
significant facts psychological, psychiatric, or otherwise intangible 
and so non-statistical? Such questions suggest the meagerness of 
the present scientific achievement of the social sciences, but they 
also suggest the importance of more strenuous effort to factorize 
the social situation in a significant way and in a way that permits 
statistical analysis. ' 

Experimentation has a limited use in the social sciences. Human 
beings object to being the objects of a mass experiment j they do 
not submit to it like electrons or guinea pigs. Yet sometimes a 
condition is set up which might be a social experiment for the pur- 
poses of the statistician. A plan for building playgrounds in a city 
might be followed by the observation of the effects of these play- 



INTRODUCTION 


71 

grounds on the rate of juvenile delinquency in their neighbor- 
hoods. That would not be set up as an experiment in the sense that 
a chemist enters his laboratory and sets up an experiment, but it 
would provide a new factor in the social environment, and the 
observer might record changes in behavior, like delinquency, 
which followed the introduction of the new factor. The conditions 
of the experiment would have been provided by the city govern- 
ment and for the purpose of meeting a public demand for recrea- 
tion, and the social scientist would accept the set-up and utilize it 
for his own purposes without in any way disturbing those plans. 
The Eighteenth Amendment is sometimes called an experiment, 
even a “noble experiment,” but it is a social experiment on such 
a gigantic scale and involves such a complex of social factors that 
the efforts at statistical appraisal of it have so far been questionable. 
Nevertheless, in spite of the fact that the social scientist cannot 
set up many social experiments by himself, he may make use of 
such experiments as the two mentioned above to try his technique 
and to refine his methods. A few successful studies of such experi- 
ments might lead to much more extensive efforts of city, state, 
and federal governments deliberately to analyze the effects of an 
administrative policy or of new legislation. Sometimes, of course, 
they do not want to know the effects, but in other cases the po- 
litical values might not be so potent. 8 ' 

The complexity of social situations serves to emphasize the need 
for statistical analysis. The politician, the statesman, and fre- 
quently the historian have explained social situations by “for- 
tuitous occurrences of special or isolated character which do not 
appear to operate or recur in any fixed order .” 9 The multifarious 
occurrence of the same factor in different magnitudes requires a 
method of summary statement provided only in the graphs, the 
averages, the measures of variations, and the correlations of the 
statistician. The factors which show a long-time development in 
one direction, accompanied by short oscillations and by seasonal 
fluctuations, are the fundamental social data; the fortuitous events 
should also be considered, but only in proportion to their impor- 
tance in the social order. The student of social statistics needs to 
be fully aware of the innumerable social factors and their com- 

8 See Chapin, F. S., “The Problem of Controls in Experimental Sociology,” 
Journal of Educ. Soc., Vol. 4, No. 9, pp. 541-551. 

9 Rice, Stuart A., “The Historico-Statistical Approach to Social Studies,” in 
Statistics in Social Studies edited by Stuart A. Rice, Chap. I. Philadelphia: 
University of Pennsylvania Press, 1930. 



72 SOCIAL STATISTICS 

binations, but his attention should be directed to the discrimina- 
tion of the more significant factors and to experimentation with 
these by the methods of statistical analysis. 

4. HOMOGENEITY OF DATA 

Homogeneity in statistics refers to data of the same kind. Apples 
and potatoes cannot be added ; nor will much be known about 
apples by either seller or buyer if big apples and little apples, 
good apples and rotten apples, cooking apples and eating apples 
are all put into a single class. Knowledge of apples is gained by 
separating them according to certain attributes and then putting 
together those with like attributes. This homely illustration will 
suggest that social situations are analogous. Social facts of like 
kind must be put together. The greater the degree of likeness, 
or homogeneity, the more reliable will be the results of statistical 
analysis. Among sociologists the students of rural life have gone 
furthest in understanding their problems, and no small part of 
the explanation of this lies in the fact that they have factorized 
their problems with a view to statistical analysis and have selected 
their data with a careful eye to relative homogeneity. A farmer 
has been recognized as a human being, but he also has children 
who go to school, he is a churchman in many instances, he belongs 
to lodges and farmers’ organizations, he is a citizen and participates 
in the government of his community, he cultivates the land and 
raises livestock, and he buys and sells in an open market. But he 
does all these things in varying degree: some farmers make the 
most of the schools for their children; others do not. Some have 
a high general standard of living, while others grade down to a 
very low standard. The rural sociologists have been occupied with 
these problems and have been able to understand them, because 
they have analyzed their data into homogeneous classes. 

Homogeneity is a matter of degree, and the highest degree is 
to be desired. Any degree of heterogeneity introduces extraneous 
factors which have to be considered or neglected. If they are neg- 
lected, the validity of the results of the study is questionable to 
the extent of the influence of the extraneous factors. For example, 
in a study of crime in which the main interest is crimes against 
property, the conclusions would be vitiated if all crimes were in- 
cluded. Only crimes against property would be pertinent. But 
crimes against property are of many kinds: theft, robbery, bur- 
glary, embezzlement, arson, wanton destruction, etc. Among pro- 



INTRODUCTION 


73 


fessional criminals it is known that a man usually specializes in a 
particular form of crime against property, because in that way he 
can become more proficient. So it would improve the homogeneity 
of the data if criminals were separated according to the kinds of 
crimes against property which they committed. Age is sometimes 
important. For example, hold-up men are usually rather young, 
while embezzlers are more likely to be considerably older. Why 
is such an age division found? That is one of the problems to be 
studied. Further analysis leads to a greater refinement of homo- 
geneity. 

The United States Bureau of the Census intends to get enough 
facts about each person so that demographic studies of the United 
States may have a maximum of dependability. The original pur- 
pose of the census was to determine who should vote and how to 
apportion representatives to Congress, but anyone who answered 
the questions of the census enumerator in 1930 knows that many 
more facts are now asked. Additional facts are wanted now by the 
Bureau of the Census in order that we may know more about the 
population, but, boiled down to the lowest terms, they are wanted 
for various statistical purposes requiring homogeneous groups of 
facts about the population. The variety and uses of the census 
material were pointed out in greater detail in Chapter II. 

The degree of homogeneity possible varies according to 
whether the item is a true variable or an attribute. A true variable 
can be measured in terms of length, area, volume, weight, time, 
money units, pressure, etc. Approximate accuracy is possible here, 
but ultimate homogeneity is doubtful. A good illustration of the 
difficulty is found in the practice of astronomers who take many 
observations of the position of a celestial body and then get the 
average of their observations. Greater accuracy is achieved in 
astronomy perhaps than in any other science, and yet repeated 
measurements of the same thing vary slightly. Nevertheless, it 
may be said that the greatest homogeneity is achieved in the meas- 
urements of true variables. Some attributes, like the number of 
children in families, can be presented with a high degree of homo- 
geneity. But others offer greater difficulties: nationality, race, 
delinquency, mental disorder, and others. One of the reasons why 
social statistics has lagged behind economic statistics in the sys- 
tematic collection of data is the difficulty of securing comparable, 
that is, homogeneous, data. Why are juvenile court statistics, or 
any court statistics, almost uniformly unreliable? Mainly because 



74 


SOCIAL STATISTICS 


of the inability to reach agreement as to what are significant at- 
tributes and what are the precise definitions of these attributes. 
Perhaps too much has been attempted. The desire has been to 
understand the intangible socio-psychological causes back of crime 
and delinquency. But these have not yet been so defined that two 
different persons will report comparable facts. Only tireless trial 
and error, careful observation, and accurate recording will improve 
the quality of such social statistics. 

For the most part the degree of homogeneity attained in any 
collection of data is a matter of judgment. A precise definition of 
the items sought, in the mind of the observer, is still the best 
means of securing comparable data. On this point Professor Gid- 
dings says, “Obviously any fact of sort or of size, of quality or of 
quantity, is truly representative and therefore may without error 
be taken as a sample of a pluralistic field, if the difference between 
any other item whatsoever of the aggregate and any other item 
of it is negligible for the purpose in hand .” 10 If the pluralistic field, 
that is, any number of attributes or variables of the same general 
kind, is relatively homogeneous, then even a small sample may 
be representative of the whole. In a highly heterogeneous collec- 
tion of data probably no sample would represent the aggregate. 
Good judgment is required in selecting an accurately defined 
pluralistic field and in choosing the sample. Statistical analysis may 
help in deciding whether a group of data are homogeneous or 
not. If data are put into frequency classes and plotted, assuming 
that a sufficiently large sample has been collected from the plu- 
ralistic field, lack of homogeneity will be revealed by two or more 
humps in the curve. That is, the frequency distribution will be 
bi-modal or multi-modal (methods of arriving at the mode are 
shown in Chapter VIII). There are other explanations of multi- 
modality, but this is one to look for. 

5. LOGIC AND STATISTICS 

“You can prove anything with statistics,” says the man in the 
street. Liars have been classified, and with some justification, as 
“liars, damned liars, and statisticians.” The statistician might reply, 
has not almost anything been proved by the use of history or by 
popular myths or by theology? When such questions are raised 
about the logical validity of a principle or method, a momentary 
impasse is reached. By way of explaining the popular skepticism of 

30 Op. cit., p. 83. 



INTRODUCTION 


75 

statistics, it should be emphasized that statistics is a method of 
analysis of data which may be applied to any data by anybody. The 
analyst may have “an axe to grind.” He may be excessively de- 
sirous of proving something that has utility for him or enhances 
his personal prestige. Statistics as a method is impersonal, as im- 
personal as mathematics upon which it depends. But it may be 
used by anyone as a means of pleading his special case. Some 
questions are so stated that they cannot be answered by statistical 
analysis, and others involve conflicting viewpoints accompanied by 
different definitions of terms. Professor Wolfe has cited a number 
of such questions in his discussion of statistics as a scientific method: 
Can the railroads continue to pay high wages and at the same time 
reduce transportation rates?. What proportion of industrial work- 
ers are getting a “living wage”? What is “normalcy”? What is 
“prosperity”? What is overpopulation? What legislation is social- 
istic? What is confiscatory taxation? 

Commenting upon the type of question mentioned above, Pro- 
fessor Wolfe says: “It may be said, with truth, that these are 
questions involving standards of equity of which no objective defi- 
nition can be made. Yet they are the type of question upon which 
legislatures and the courts and the general public are constantly 
passing judgment, and toward the solution of which the scientific 
student of social matters should be expected to contribute objective 
data, if not formulated conclusions. It may be said that the scien- 
tific investigator should avoid problems involving such difficulties. 
But the patent fact remains that if the scientist does not grapple 
with them the non-scientist will, with results that can scarcely be 
expected to be as well founded as those at which the scientist will 
arrive. If we cannot be objective, we must be as objective as we 
can .” 11 It is in attempting to answer such questions as objectively as 
possible that the reputable statistician is sometimes charged with 
being able to prove just anything. Likewise it is in dealing with 
such questions that the tyro in statistics does prove just anything, 
and in the estimation of the public implicates the competent stat- 
istician. Hence the logic of statistics requires a careful definition 
of what statistics can do and what it cannot do. If some of these 
matters of equity must be tackled by the statistician, then his wis- 
dom will be judged by his careful discrimination of objective data 
which he may analyze from qualitative matters not amenable to 

31 Wolfe, A. B., Conservatism, Radicalism , and Scientific Method, p. 247. 
New York: The Macmillan Co., 1923. 



SOCIAL STATISTICS 


76 

statistical treatment and concerning which he can have no opinion 
qua statistician. Thus, the charge that anything can be proved by 
statistics should be altered to read, “Persons who use statistics can 
prove anything.” The criticism is properly of the man and not of 
the method. 

Logic is commonly divided into deductive and inductive. De- 
ductive logic reasons from a general to a particular proposition; 
it applies to a particular case a truth that is known in general. 
Inductive logic reasons from particular cases to a general con- 
clusion. Both methods are used continually in the social sciences, 
and both must be used. The formulas of statistics are deductively 
derived from mathematics. They are assumed to be true. These 
methods are then used as tools for the analysis of aggregates of 
particular cases, namely, statistical data from which may be in- 
ferred a general conclusion. If the social studies are to grow and 
to develop toward a more scientific status, workers in these fields 
must use the conclusions of other investigators. Whenever such 
use is made of conclusions previously reached, deductive logic has 
been employed, the conclusions being used as a part of the data 
of an inductive study. Some discussions of these two aspects of 
logic seem to imply that they are antagonistic: that deduction is 
an outworn method of the Medieval Schoolmen, and induction is 
the bouncing boy of modern scientific methods. Such a position is 
untenable; they are complementary methods constantly used in 
scientific work. The difference between the older logic and the 
newer is that in recent times no general conclusions have been 
regarded as absolute; all are subject to modification in the light 
of new facts. Science criticizes the premises of reasoning as well 
as its conclusions. That is the mark of its special superiority over 
the Aristotelian logic. 

The collection of data bearing on a statistical problem is a step 
in the inductive process. Resemblances, differences, and relation- 
ships are noted. The data are classified, and then averages, dis- 
persions, trends, and correlations are computed. The reliability of 
the results turns upon the competency of the worker, and the 
whole process is moving from particulars to generals, guided and 
perhaps illuminated by much that is already known. Almost any 
project will involve the formulation of a hypothesis for a working 
base. Careful analysis of the data may demonstrate the truth of 
the hypothesis, or it may require its modification or abandonment. 
Statistical work undertaken without some kind of hypothesis is 



INTRODUCTION 


77 

likely to be pointless, but the hypothesis should be held tentatively, 
only tentatively. The prestige of the worker is not bound up with 
selecting correct hypotheses so much as it is with painstaking, 
intelligent work. Defense of an hypothesis in any degree because 
it is the child of the worker vitiates confidence in his work. An 
inductive conclusion must be inevitable in the light of the facts. 
Of course, it may be partly demonstrated and held tentatively for 
further investigation. 

Statistics is liable to fall into the same logical fallacies as any 
other kind of reasoning or scientific work. A few of the more 
common ones will be indicated. Of fallacies characteristic of de- 
ductive reasoning, perhaps those known as non sequitur and petitio 
principii are the most common in statistical work. The phrase, 
non sequitur (which may be translated, “it does not follow”), is 
applied to any loose argument in which the conclusion does not 
seem to follow from the premises. This kind of fallacy in typical 
logical form is more likely to occur in connection with the inter- 
pretation of the applicability of a statistical formula to a given 
problem than it is in connection with reasoning about the data. An 
illustration stated in syllogistic form will indicate this danger: 

The coefficient of correlation is a measure of the degree of inter- 
dependence of two variables. 

This expression is a coefficient of correlation. 

Therefore, it measures the interdependence between two variables. 

So far as formal logic is concerned this syllogism is stated cor- 
rectly and ought to be water-tight, but the fact is that a coefficient 
of correlation might indicate mere coexistence instead of inter- 
dependence. The beginner in statistics is not unlikely, however, 
to be overenthusiastic about the discovery of a method of show- 
ing causation between two series of social facts and to assume that 
every coefficient of correlation indicates causation. As pointed out 
above, one of the services of inductive logic has been to criticize 
the premises upon which reasoning is based. The major premise of 
this syllogism requires criticism. It should be written, “Some co- 
efficients of correlation measure the degree of interdependence 
between two variables.” That statement of the major premise 
allows for mere chance correlations, of which there are many in 
social statistics. The syllogism, as stated, is known from experience 
to be not true. 

The fallacy of petitio principii is generally interpreted by the 



SOCIAL STATISTICS 


78 

phrase, “begging the question.” It is the assumption that the con- 
clusion is true without proving it. A special form of it is called 
“reasoning in a circle,” which is the attempt to prove a conclusion 
from a premise, when the conclusion itself is a part of the proof 
of the premise. Here is an illustration of reasoning in a circle: 
“The increase of insanity indicates a weakness in our civilization, 
because, if it did not, there would not be so many insane persons.” 
In the first part of this statement, insanity is the evidence of weak- 
ness in our civilization, and in the second part the weakness of 
our civilization is proof of the many insane persons. The circu- 
larity of this reasoning is obvious, but it is more subtle in elaborate 
discussions of a problem and is harder to detect. 

Some of the fallacies characteristic of inductive reasoning, such 
as inaccurate observation and finding what one looks for, have 
already been suggested, but special attention may appropriately be 
directed to certain fallacies of judgment and of conception. Errors 
of judgment may lead to the assumption of something as a cause 
when it is not or to the belief that, because a certain event precedes 
another event, the first is the cause of the second. The problem is 
to make sure that the occurrence of such events in sequence is not 
mere coincidence but interdependence. These errors of judgment 
are back of most magic of primitive peoples and of superstition of 
more highly civilized persons. But they may easily occur in statis- 
tical work. A new high tariff is passed by Congress, and within 
the next few months the country is in a stage of rising prosperity ; 
therefore, say the high tariff politicians, the tariff caused pros- 
perity. As a matter of fact, a study of the economic history of the 
country for the past generation indicates that the peaks of pros- 
perity come irrespective of whether the party in power is protec- 
tionist or non-protectionist. Another instance of this fallacy is the 
common statement that insanity is increasing. The number of 
insane in institutions is increasing, but that fact does not prove 
that there are more insane persons in proportion to population 
than there were twenty-five years ago. Fallacies of the conceptual 
processes occur particularly in the attempt to formulate generaliza- 
tions. An attempt may be made, as it has been frequently made in 
sociology and other social sciences', to state a general scientific law 
which applies to a wide range of phenomena, before enough facts 
have been examined or where the data are too heterogeneous to 
permit any such generalizations. A more common form of con- 
ceptual error among social statisticians, however, occurs in con- 



INTRODUCTION 


79 


nection with prediction of the trend of a series of events. Index 
numbers may be computed and the long-time trend worked out. 
If the index has to do with industrial production, it is important 
in many ways to estimate what the index may be a year or two in 
the future. This is called extrapolation. It is an important but a 
precarious business. New factors may enter the business situation, 
as they did in 1929, which invalidate all the estimates of perpetual 
prosperity. The limitations surrounding the statistician in the pres- 
ent state of knowledge in the social sciences should be a curb to 
overconfidence in such predictions on the basis of a computed long- 
time trend of business, prices, wages, crime, or what not. 12 

6. SCIENTIFIC LAW OR SCIENTIFIC METHOD? 

The ultimate aim of scientific research in any field is the dis- 
covery of regularities in the data which permit of brief statement 
in the form of a scientific law. A law of science is a shorthand 
description of regularities among a certain kind of data. The rec- 
ognition of a general description of regularities of phenomena 
accorded by scientists the status of a scientific law usually depends 
upon its statement in mathematical terms. The author is not aware 
of any scientific law, properly so called, which has not been so 
stated. As familiar examples, the law of falling bodies and the 
law of gases may be mentioned. In biology there is Mendel’s law 
of heredity, and, coming closer to the social sciences, the law of 
population growth formulated by Pearl and Reed. Many others 
from the natural sciences could be mentioned and some from the 
biological sciences. But the further the data of a science are from 
such elemental facts as weight, gravitation, motion, and electrons, 
the more difficult it is to state regularities in the simplified and 
absolute terms of mathematics and, hence, the less likely is the 
science to discover laws. 

The social sciences deal with such complex data that the gen- 
eralizations they make about social phenomena will rarely attain 
the exactness required for recognition as scientific laws. Perhaps 
the so-called law of diminishing returns in economics is as good 
an example as has yet been formulated. Seager stated this law 
some years ago in the following way: “. . . after a certain point 
has been passed in the cultivation of an acre of land or the exploi- 

12 For this discussion of fallacies, the writer has drawn upon Hibben, J. G., 
Logic, Deductive and Inductive, Part I, Chap. XIX, and Part II, Chap. XVI. 
New York: Scribner’s, 1906. 



8o 


SOCIAL STATISTICS 


tation of a mine, increased applications of labor and capital yield 
less than proportionate returns in product. . . ,” 13 This law can 
be expressed with an approximation to accuracy in mathematical 
terms. Of course, it assumes that various factors will remain con- 
stant, such as rainfall, fertility, and quality of seeds used in farm- 
ing. Since these things do not remain constant, the practical ap- 
plicability of the law is limited, but it is an approximation to the 
kind of statement which in the natural sciences is called scientific 
law. Another illustration from economics is Gresham’s Law which 
states that, when two kinds of money are in circulation, the cheaper 
money drives the dearer money out of circulation. The extent to 
which this occurs could be stated mathematically. There are not 
many such close approximations to scientific law in the social sci- 
ences. Perhaps sociology, political science, and social work can 
boast of none. 

But statistics is an application of mathematics. If it is applicable 
to social data, ought we not to expect to discover some regularities 
of social phenomena which can be stated as scientific laws? It is 
possible, but statistics will in general have a humbler task in the 
social sciences and in social work. It will be more often of adminis- 
trative value than as a means of discovering laws. The aim, of 
course, is ultimately to discover laws. But the more immediate 
ideal in the social sciences is a faithful application of scientific 
method to the study of their data. In the words of Karl Pearson, 
“The man who classifies facts of any kind whatever, who sees 
their mutual relations and describes their sequences, is applying the 
scientific method and is a man of science.” 14 Scientific method, in 
Pearson’s definition, is a way of arriving at a true interpretation 
of facts, their relations, and their sequences, regardless of whether 
or not a generalization qualifying as a scientific law is ever made. 
The social statistician is a scientific worker in that sense. Much 
valuable work of this kind has already been done, and much more 
will be done with the increased use of statistical methods in the 
social sciences and in social work. 

18 Seager, Henry R., Principles of Economics , p. 129. New York: Henry 
Holt & Co., 1913. 

“Pearson, Karl, Grammar of Sciencef 1911 edition, Part I, p. 12. 



CHAPTER IV 


Wording Out a Statistical Problem 


Every statistical problem presents special points for consideration, 
but there are a few general matters that may be discussed as steps 
in procedure. There can be no statistical problem for the student 
or worker, unless there has arisen some question the answer to 
which is not immediately apparent. The question may be very 
indefinite at first. After it has been thought out and becomes 
clearer, the worker begins to think of the kind of data required to 
answer the question. Once the required facts are decided upon, 
the next problem is to gather them. This is usually the most ardu- 
ous step in statistical work, because accurate, comparable data may 
have to be gathered first-hand. That means schedules, question- 
naires, report forms, and interviews. Or the data may already have 
been assembled in reports, in which case the field work is elimi- 
nated, but a new problem of determining the comparability of 
data has at once arisen. Data which the investigator gathers from 
the original sources are known as primary data; those which have 
already been assembled by other field workers and are in pub- 
lished reports, perhaps for wholly different purposes, are secondary 
data. This kind of classification does not imply that secondary 
data are inferior to primary data; on the contrary, they may be 
better than the investigator under the best of circumstances could 
gather for his own purposes. For example, the reports of the 
Bureau of the Census contain data which are secondary for any 
outside student, but no individual could make for his own pur- 
poses a census of the whole population which, assuming that he 
could pay the cost, would be as reliable as the secondary data in 
the census volumes, because the technique of taking a census has 
been developed and improved by the Bureau for over a hundred 
years. But for a special problem for which no data have been 
collected by reliable agencies, the investigator will have to make 
his own field investigations. 


81 



82 


SOCIAL STATISTICS 


In order to indicate more concretely the steps in working out 
statistical problems, two examples of such studies will be de- 
scribed at some length. The first will illustrate the procedure in a 
problem for which data were gathered by the investigator, and 
the second will describe a well-known study based upon secondary 
data. 


I. A PROBLEM EMPLOYING PRIMARY SOURCES 1 

Any statistical problem is usually taken from a general field of 
interest. Because of his connection with Indiana University in 
which a number of studies of crime from various viewpoints were 
under way, the author undertook to study a single problem, the 
distribution of felonies in 1930, in Indianapolis. This problem is 
a mere bagatelle, when the whole field of crime, even in Indianap- 
olis, is considered. But it is important for various reasons: ( 1) it is 
fundamental to adequate police protection; (2) it is important as 
one means of determining the populational sources of criminals; 
(3) it is important to the courts to know whether a given criminal 
comes from a neighborhood in which many other criminals live. 
Other reasons for the study of this particular aspect of crime might 
be cited, but these suggest why the study was undertaken. 

Once the problem has been roughly outlined in the mind of 
the investigator, the next step is to define it and determine what 
divisions it may have. Only objective facts can be considered, and 
they must be available. What are the aspects of the distribution of 
felonies in a city which may be studied by statistical methods? 
First, crimes occur at certain places — there are exceptions, such as 
transporting stolen property, when the crime occurs in a succession 
of places, but in general the charge against an alleged criminal 
specifies a place of the offense. Second, criminals live, even but for 
a day, at definite places. They are distributed over the city, and 
in some places their density is greater than in others. Third, the 
crime may be committed at the residence of the criminal, or he 
may go some distance from home to commit the offense. Do some 
types of crimes tend to he committed near the residence of the 
criminal and others at varying distances? Fourth, crimes are dis- 
tributed by sex, certain types being *more prevalent among males 
than females, and vice versa. Fifth, crimes are distributed by age 
of the criminals. Taking all crimes together, there are age groups 

1 White, R. Clyde, “The Relation of Felonies to Environmental Factors in 
Indianapolis,” Social Forces , Vol. X, No. 4, pp. 498-509. 



INTRODUCTION 


83 

in which criminality is more prevalent than in others. Some types 
of crimes are usually committed by young men or women, and 
other types are committed by persons somewhat older. Thus, the 
distribution of crime in a city may be approached from five differ- 
ent angles, the data for each of which are fairly objective and 
readily obtainable through the cooperation of the court. 

How were the data obtained for the study of distribution of 
felonies? A record form was made out and reproduced by mimeo- 
graph. Figure I gives this form: 


Year 1930 Month of January 


Case No. 

Charge 

Tract of Residence 

Tract of Offense 

Sex 

Age 


























Figure I. — Cases Disposed of by Marion County Criminal Court for the 
City of Indianapolis 


The two columns headed, “Tract of Residence” and “Tract of 
Offense,” indicate the census tract of the city in which the criminal 
lived and the one in which he committed his offense. These census 
tracts are small areas with highly homogeneous population — 
homogeneous as to nationality, color, economic and social status, 
and age and sex distribution. Instead of taking the street number 
of the offender, the census tract was given. The distance of resi- 
dence from the place of the offense was measured from the center 
of the tract of residence to the center of the tract of offense. Many 
offenses were committed in the tract of residence, in which case 
the distance traveled was negligible, because in no tract could the 
offender be more than a few blocks from home. Who should get 
this information from the criminal? It would have been a full-time 
job for the investigator to do that. The information is ordinarily 
obtained as a routine matter by the court, except for the census 
tract designations. The chief probation officer of the court agreed 
to use the forms worked out and to get the information on each 



SOCIAL STATISTICS 


84 

case for the study. Every case of guilt disposed of by the court 
during the year was included. At the end of the year the com- 
pleted forms were turned over to the investigator. 

After collecting the data, the next step is to assemble them in 
some form that permits analysis. This could have been done by 
hand, because the number of cases was not large, and the number 
of facts about each case was small. If this had been done, a work 
sheet would have been made up on which the age, sex, offense, 
place of residence, place of offense, etc., would have been tallied 
in, but this would have required a good deal of time. Since the 
investigator had access to machines for punching and sorting cards, 
the first step in assembling the data was to give each fact a symbol 
which appeared on the cards and punch the symbols out. The card 
is reproduced opposite. 

Such cards may be worked out for any kind of statistical study, 
where large numbers of items and cases make hand tabulation 
onerous and expensive. This card was planned for a larger study 
of juvenile delinquency and has columns for many more facts 
than were obtained for the adult felons in the study of distribution 
of felonies in Indianapolis, but it contains columns for all the facts 
requested, and it could be used for tabulating the data for felonies 
also. The black spots indicate symbols punched out. The student 
will notice that at the head of each column marked off with heavy 
lines is printed the name of the fact to be punched in this column. 
Under “Residence Tract” the 3 in units column and the 5 in tens 
column are punched, that is, the tract is No. 53. Under “Offense 
Tract” the 6 in units column and the 5 in tens column are punched, 
which means that the offense was committed in tract No. 56. All 
the common offenses are numbered from 1 to 99, and it will be 
noticed that the 2 in units column is punched under the column 
headed “Offense,” which means that the crime committed is de- 
noted by the figure 2 and is assault and battery with intent to kill. 
Under “Age” 2 in units column and 2 in tens column are punched, 
that is, the offender was 22 years old. Under “Sex” 1 is punched 
which is the symbol for male sex, and under “Color” 1 is punched 
which is the symbol for white race. Each case has one of these 
cards, and the appropriate symbols were punched on it. The entire 
time required for punching the cards was about three hours. Hand 
tabulation would have required much longer. After a little experi- 
ence of reading the symbols from the punched cards, it is quite 
easy to read off any data that might be wanted. However, that is 





























86 


SOCIAL STATISTICS 


not necessary, because the sorting machine which appears below 
does it more rapidly and with less chance of mistakes. 



Figure III. — The Electric Key Punch 


When the cards are punched, they are ready to be put into the 
sorting machine for any kind of classification that may be desired. 
Perhaps the first sorting was on sex. It is important for the analy- 



Figure IV. — The Electric Horizontal Sorting Machine 

sis to separate males and females, (jnder “Sex” at the bottom of 
the column a small figure 22 is printed. This number is the guide 
for setting the machine to sort the sexes. The cards are put into the 


INTRODUCTION 


87 

feeder, the machine is set, and then the electric button is pushed. 
About 400 cards per minute go through the machine used — other 
machines sort at a higher speed. All of the cards with 1 punched, 
that is, males, fall into a pocket, and all the cards with 2 punched, 
that is, females, fall into a different pocket. Thus, male and female 
criminals were separated. Then the males and females were sorted 
by census tract, age, etc. Any kind of an analysis could be made. 
The cards in any particular sorting were counted by the machine 
and notation made of the number. When all the significant sortings 
had been made, the data were ready for further analysis. 

Tables were then made. These tables contained the number of 
crimes committed in each census tract, the number of criminals 
living in each tract, the distribution by sex and five-year age inter- 
vals, number of crimes of each type, comparison of the character- 
istics of the general population with the criminals, distances from 
home to the place of the offense, and other cross-classifications. 
For the purposes of illustrating the procedure in the analysis of 
a statistical problem it is not necessary to present all the tables and 
graphs included in the completed study, but a few tables will be 
given to make this discussion more concrete. 

TABLE I 

Nine Kinds of Crime against Property, Showing the Number of Each, the 
Average Distance between the Home of the Offender and the Place of the 
Offense, and the Number of Cases in Which the Offense Was Committed in 
the Same Census Tract as the Residence 


Offense 

Number of 
Offenses 

Average Dis- 
tance Between 
Residence and 
Place of Offense 
in Miles 

Offenses Com- 
mitted in 
Same Tract 
as 

Residence 

Total 

4 3 6 

i -74 

60 

Banditry, automobile 

9 

3-43 

0 

Embezzlement 

21 

2.79 

3 

Robbery 

20 

2.14 

1 

Vehicle taking 

76 

1 11 

7 

Burglarv 

1 21 

1 .76 

11 

Grand larceny 

117 

i -53 

23 

Obtaining money under false pretense. 

38 

1 -47 

6 

Petit larceny 

25 

1.42 

6 

Receiving stolen goods 

9 

.90 

3 


There appears to be a tendency for persons committing crimes 
against property accompanied by violence to go farther from 
their places of residence for this purpose than is the case with 



88 


SOCIAL STATISTICS 


crimes against property without violence. The principal exception 
to this tendency is embezzlement. The latter is partly due to the 
fact that embezzlement is usually committed at a place of business; 
eleven of the 2 1 cases of embezzlement were committed in census 
tract 56 which is in the heart of the business district of the city. 
Persons who have an opportunity to embezzle funds usually have 
responsible positions and draw good salaries. Such persons in In- 
dianapolis are likely to live at some distance from the business 
district. If this is the proper explanation here, it removes the one 
important exception to the inference drawn above. No case of 
automobile banditry occurred in the residence tract of the bandit, 
and only one case of robbery occurred in the same tract as the 
residence of the robber. If there were some way of accurately 
rating crimes against property according to the seriousness of the 
offense, a curve might be drawn to show the connection between 
distance traveled and the seriousness of the offense, or the degree 
of correlation might be computed. But this is not possible. The 
inference has to be made tentatively from appearances in the table. 

Another interesting fact about this study of felonies is the age 
distribution of the criminals. This is given in the table below: 

TABLE II 


Age Distribution of 651 Felons Appearing before the Marion County, Indiana, 

Criminal Court in 1930 


Age Group 

Both Sexes 

Male 

Female 

All Ages 

651 

631 

20 

16-19 

194 

1 93 

1 

20-24 

180 

173 

7 

25-29 

74 

70 • 

4 

30-34 

88 

85 

3 

35-39 

49 

48 

1 

40-44 

30 

30 

0 

45-49 

21 

18 

3 

50-54 

4 

3 

1 

55-59 

5 

5 

0 

60-64 

3 

3 

0 

65-69 

2 

2 

0 

70-74 

1 

1 

0 


The concentration of this criminal group in the ages below 25 
years is striking; over half of them are less than that age. The 
low percentage of women is alstf impressive. Felonious crime is 
a problem to a large extent involving young men. When this age 
distribution is broken down to particular kinds of offenses, it is 



INTRODUCTION 


89 

found that crimes against property with violence and vehicle taking 
are committed mainly by young men. Burglars and thieves have a 
somewhat higher average age, and embezzlers a still higher one. 

No other tables will be given, but a brief summary of other 
findings will be made. Crimes against the person, like assault and 
battery, manslaughter, murder, and rape, occur nearer the home 
of the offender than do crimes against property. The average dis- 
tance from home of 37 crimes of this sort was .84 of a mile, and 
19 of these were committed within the residence census tract. Of 
nine cases of manslaughter eight occurred in the census tract where 
the offender lived. Seven out of 16 cases of assault and battery 
with intent to kill were committed in the residence tract. Three 
out of eleven cases of rape occurred in the residence tract— -this, 
in the matter of distance, is more like crimes against property. 

The concentration of the residences of criminals is in the center 
of the city and especially in those census tracts where rooming 
houses prevail. Likewise the places where felonies are committed 
are in the downtown district. This is partly due to the fact that so 
many felonies are crimes against property, and wealth is concen- 
trated in the downtown district. This condition is similar to that 
found in some other cities, notably Chicago, where studies of the 
distribution of crime have been made. 

The essential steps in this study have been: (1) definition of 
the specific problem to be studied; (2) deciding upon the data to 
be obtained about each criminal; (3) framing a schedule on which 
to record the data; (4) arranging with an official of the court to 
obtain the data; (5) coding the data for each case so that it could 
be punched on the tabulation cards; (6) punching the cards; (7) 
sorting the cards according to various combinations of facts; (8) 
assembling these data on work sheets; (9) making tables of various 
kinds; (10) study and interpretation of the data; (11) the written 
report. 


2. A PROBLEM EMPLOYING SECONDARY SOURCES 

The general procedure in working out a problem concerning 
which data are drawn from secondary sources is somewhat different 
from that in the preceding problem. To illustrate this type of sta- 
tistical problem a well-known study, Social Aspects of the Business 
Cycle , by Dorothy S. Thomas, will be used. 2 This particular study 

2 Thomas, Dorothy S., Social Aspects of the Business Cycle. London: George 
Routledge & Sons, Ltd., 1925. 



90 


SOCIAL STATISTICS 


has been selected for several reasons: (i) it is exclusively statisti- 
cal; (2) it has been widely accepted as a good example of work 
in social statistics; (3) the author has depended entirely upon 
secondary sources; (4) the author has been under the necessity of 
evaluating the reliability of her material before proceeding to sta- 
tistical analysis; (5) since the study was based upon data drawn 
from two nations and represented all of the reliable statistical 
data on the subject in both nations, the author has been careful to 
point out just what her study contributes to the knowledge of the 
relationships of the business cycle and other social series. 

Many social scientists had noticed the apparent relationships of 
general economic conditions to certain social problems, but up to 
the time Dr. Thomas undertook her study nobody had made a 
thoroughgoing analysis of the problem. In order to delimit her 
own problem she had to examine other discussions of the subject, 
and in the book she has preceded her analysis by a critical discussion 
of previous works on social aspects of the business cycle. She says, 
“The subject has long been of interest to economists, sociologists, 
criminologists, and statisticians, but has received no wholly ade- 
quate treatment in which the relationships between these various 
social phenomena and the business cycle have been classified and 
expressed in quantitative terms.” 3 Economists had noticed that 
marriages, births, and deaths from certain diseases seemed to be 
associated with the ups and downs of prices and of general busi- 
ness conditions. Some of them had noticed that, not only temporary 
dependency, but pauperism seemed to increase after a severe busi- 
ness depression. Others had remarked that alcoholism is a disease 
of prosperity. Certain crimes against property occurred more often 
in times of depression. Some criminologists have called attention to 
the fact that an increase in certain kinds of crime lagged behind 
the rise of prices but seemed to be connected with this economic 
factor; others have believed that crime fluctuates according to 
general fluctuations in industrial conditions rather than with the 
price level alone. Statisticians who have turned their attention to 
this problem have been concerned generally with the relation of 
the business cycle to marriage rates and birth rates, though now 
and again other phenomena have been considered. 

Dr. Thomas notes increasing Mention to criticism of methods 
of analyzing such data as time passes but concludes that none of 
the studies is sufficiently comprehensive to permit anything like a 

8 Op . cit., p. 24. 



INTRODUCTION 


9i 


general conclusion regarding social aspects of the business cycle. 
The business cycle is the term applied to the ups and downs of 
general economic conditions which recur every few years and 
which are in process continuously. The trend of economic condi- 
tions is another matter: it refers to the direction of growth over a 
long period of years, say, forty or fifty or more. This trend may 
be upward, downward, or curvilinear. Whatever its direction, it 
should not be confused with the short changes known as cycles or 
the still shorter seasonal variations. The chief criticism of method 
which may be made of earlier discussions of the social aspects of 
the business cycle is that they were really concerned with both the 
long-time trend and with the cyclical changes. A few later statis- 
tical studies tried to separate these two kinds of change, -but no 
comprehensive study was made until Dr. Thomas and Professor 
William F. Ogburn undertook to consider all the available data 
in the United States and to eliminate the long-time trend from 
their data so that they could study only the effects of the short, 
cyclical changes upon social phenomena . 4 This led Dr. Thomas 
to undertake some more detailed study of certain American data 
and to supplement this with a comprehensive study of similar 
data in England, where social and economic statistics have been 
kept for a longer period than in the United States. 

Since all the various social phenomena were to be compared with 
business cycles, the first problem Dr. Thomas had to attack was 
the discovery of data for, and the computation of, an index of 
general business. This was to be the independent variable in all 
cases. No single type of business could be taken as an index of 
general business. Several different economic series had to be com- 
bined. After an examination of various kinds of economic data, 
the following were selected for combination into a general index 
of business in England and Wales: exports of produce, Sauerbeck 
index numbers, percentage unemployed, production of pig iron, 
production of coal, railway freight traffic receipts, and provincial 
bank clearings. These series were selected for the following rea- 
sons: “In the first place, the series selected must move synchro- 
nously. There is frequently a difference of two or three years 
between the maximum of two representative series of business 
statistics, although both move in cycles. Series must be selected 

4 Ogburn, William F., and Thomas, Dorothy S., “The Influence of the Busi- 
ness Cycle on Certain Social Conditions,” Jour . Amer . Stat. Soc., September, 
1922. 



92 


SOCIAL STATISTICS 


which reflect closely the general business situation, and series 
which are so sensitive that they forecast the general movement, 
as well as those which lag considerably behind, must be discarded. 
The series must also be as widely representative as possible of all 
of the most important phases of economic activity which are 
affected by the business cycle .” 5 These conditions seemed to the 
author to be met by the series of business data mentioned above. 
Accordingly, an index was computed. It should be stated that, if 
such a problem as this were undertaken now, it would not be 
necessary for the investigator to compute an index of general busi- 
ness several reliable ones have been computed and are published, 
currently, the most complete being those published by Standard 
Trade and Securities Service. 

After computing the index numbers, the problem still remained 
to remove the effects of the long-time upward, downward, or 
curvilinear trend so that the cyclical changes would be uncompli- 
cated with other types of change. Where quarterly social data 
were used, it was necessary to take another step and remove sea- 
sonal variations from a general business index based upon quar- 
terly economic data. The important point for our discussion is 
that here is an example of rigorous effort to measure only what 
was intended and not a number of things that were outside of the 
problem as defined. Are there cycles in social phenomena which 
are determined in any degree by cycles in business conditions? 
That is the problem before the investigator. The variations left 
in the index numbers after removing the trend and seasonal varia- 
tions are the cyclical changes. In order that these changes, whether 
of business, marriage rates, birth rates, crime, or other series, 
might be strictly comparable they were reduced to their respective 
units of variation (standard deviation, in this case) from their 
arithmetic averages. The student will not understand the full sig- 
nificance of seasonal variations, cycles, trends, and standard devia- 
tion until later chapters, but all that is required at this point is 
to recognize the use of these methods for rendering statistical 
data comparable in this particular problem. It is a part of the 
method of science — one of the rather tortuous paths to honesty 
in social science. 

The social series of data were selected, first, because they 
seemed to be accurate, and, second, because it was suspected that 
they were affected by the business cycle. These principles of selec- 

5 op. cit. f pp. 12-13. 



INTRODUCTION 


93 


tion at once ruled out many series, and, hence, narrowed the 
problem. The social series finally choseni were marriages, births, 
deaths, pauperism, alcoholism, crime, and emigration. Records 
for all these were fairly complete in England and Wales for a 
long period, and some of them were complete for a considerable 
period in the United States. Each social series has various aspects. 
Under marriages the relations of the business cycle to marriage 
rates, to prostitution, and to divorce were computed. Birth data 
included birth rates, illegitimacy, deaths from childbearing, and 
premature births. Deaths were broken down into general death 
rates, infant mortality, deaths from tuberculosis, and suicides. Pau- 
perism had three phases: indoor, outdoor, and casual. Alcoholism 
included data on per capita consumption of spirits, prosecutions for 
drunkenness, and deaths from alcoholism. Crime had six divisions: 
all indictable crimes, crimes against property with violence, crimes 
against property without violence, malicious injuries to property, 
crimes against the person, and crimes against morals. Under emi- 
gration from England the relations of the business cycle were com- 
puted for total emigration from the United Kingdom, emigration 
from the United Kingdom to the United States, and the relation 
of British business cycles to American business cycles. 

The same process of computing the long-time trend, the cyc- 
lical variations, and seasonal variations had to be repeated as for 
the business data. The interest of the author was in the cyclical 
variations of social phenomena and their relations to cyclical 
changes in business. Where the data were given by quarter years, 
seasonal indexes had to be computed and the amount of change 
due entirely to seasonal conditions subtracted. In all cases the 
average increase per year over a long period of time was computed. 
When the amounts of seasonal change had been subtracted, then 
the actual variations in crime, pauperism, and the other series 
from the general trend represented cyclical fluctuations. The lat- 
ter are the specific data the investigator had been seeking through 
the long calculations up to this point. Finally, as in the case of 
the business series, the cyclical fluctuations were reduced to their 
respective units of variation (standard deviation) from their arith- 
metic averages. Now the business data and the social data are 
strictly comparable, but their exact relationships have not been 
calculated. 

Dr. Thomas shows the relationships between the business cycle 
and social cycles in two ways. First, she presents the cyclical flue- 



94 


SOCIAL STATISTICS 


tuations graphically, showing the business cycle and marriage rates 
or other social series on the same chart. Her Chart II, showing 
the relation of the business cycle to marriage rates and divorce 
rates, is reproduced below: 


|2o- 




Ficure V. — Relation of Business Cycles to Marriage and Divorce Rates 

The ups and downs of the curve for the business cycle are closely 
matched by the ups and downs of the curve for marriage rates. 
The similarity is less marked in case of divorce, but, though 
divorce follows two or three years behind the business cycle, there 
is considerable similarity in the form of the two curves. That is, 
the business cycle appears to influence to a marked degree marriage 
and divorce rates. But the chart permits only a rough estimate of 
the degree of relationship. If this degree of relationship is to be 
measured exactly, some other method must be found. The method 
adapted to an exact measurement? of relationships of social phe- 
nomena is correlation (see Chapter XI), and the measure of rela- 
tionship is called a coefficient of correlation. This method of meas- 
uring relationships will not be discussed here in detail. It is 



INTRODUCTION 


95 


sufficient to state that a high degree of correlation was found be- 
tween the business cycle and marriage and divorce rates — higher 
for marriage than for divorce. When prosperity is high, it can be 
expected that the marriage rate is increasing and that the divorce 
rate will soon start to rise, if it has not already done so. When 
there is a depression coming on, marriage and divorce rates can be 
expected to decrease. Similar graphs were constructed and similar 
coefficients of correlation computed between the business cycle 
and the other social series. The text of the book discusses the prob- 
able significance of the relationships in each case and states con- 
clusions cautiously. 

An important part of every statistical problem, after computa- 
tions are finished, is its presentation in clear, concise literary, form. 
Graphs, tables, and numerical results are included. Whether the 
study is to be published or not, it ought to be written up. Writing 
up a report of the work helps the investigator to clarify his own 
thinking about the problem, and it makes his work understandable 
to others who may be interested in it. The investigator knows 
more about his problem than does anyone else, and in the written 
presentation he can interpret his work, injecting whatever cau- 
tions are necessary. Dr. Thomas 5 book is an admirable example 
of good presentation of statistical analysis of a problem. 

The steps in this statistical study of social aspects of the business 
cycle may be briefly summarized: (i) definition of the problem; 
(2) study of previous discussions of the same problem; (3) exact 
statement of what contribution the investigator expects to make 
to the understanding of the problem; (4) determination of the 
kind of data required to solve the problem; (5) elimination of 
series which appear to be inaccurate or irrelevant; (6) computation 
of a general index of business, followed by elimination of trend 
and seasonal variations so that only cyclical variations will be left; 
(7) elimination of trend and seasonal variations in the social series, 
leaving only cyclical variations; (8) comparison of cyclical varia- 
tions of the business index and the various social series by means 
of graphs and correlation; (9) presentation of the study in literary 
form. Any other statistical problem requiring the use of secondary 
data would have its own peculiar variations from the procedure 
used by Dr. Thomas, but some of these steps will appear in almost 
any such problem. 




Tart Two 


STATISTICAL ANALYSIS 




CHAPTER V 


Collection and Assembling of Data 


I. DEFINITIONS 

A more complete account of methods of collection and assembling 
of statistical data may now be given. In neither of the problems 
discussed above were all the common methods of collecting and 
assembling data utilized. Yet these steps are primary in social 
statistics. Research in the social sciences is just as strong, and just 
as weak, as the accuracy of the data collected. However refined 
and elaborate the mathematical analysis of the data may be, it is 
of little value if the recorded observations are inaccurate or care- 
lessly made. A civil engineering student spends much time in the 
field with transit and chain, learning to make accurate observations. 
His measurements of elevation and distance are his data. If he 
fails to adjust his transit properly, errors are made. If his chain 
is a little short or if he does not measure exactly from his fixed 
points, errors render the work unreliable for engineering plans. 
It is no less true that the social investigator, whether he be social 
scientist, social worker, or social engineer, must have learned how 
to make accurate observations on the facts he is seeking. Further- 
more, after careful consideration, he must be able to discriminate 
between secondary data which are reasonably accurate and those 
which are unreliable. 

But accuracy is a relative term. It should not be thought that 
absolute accuracy is required in social statistics. That is an impos- 
sible attainment. In every problem there is a standard of accuracy 
essential to its solution. A few people are probably missed when 
the national census is taken, but that does not seriously impair 
the value of the enumeration of more than a hundred million 
individuals. Accuracy in the observation of attributes turns upon 
the degree of precision of definition of the attribute and upon the 
assiduity of the investigator. For example, in a statistical study 

99 



100 


SOCIAL STATISTICS 


of insanity the attribute, insanity, must be carefully defined. Are 
only those patients who are confined in public hospitals to be con- 
sidered? Or will private hospitals for mental patients provide some 
of the data? If they do, then types of insanity to be included will 
have to be decided upon, because the private hospital is likely to 
have a larger proportion of mild cases than the state hospital. It 
will not be sufficient to fall back upon the legal definition of in- 
sanity, because many people legally committable go to private in- 
stitutions. Are the out-patients of mental clinics to be regarded as 
insane? Some of them would undoubtedly be admitted to a state 
hospital if application were made. From these questions, it will be 
clear that the standard of accuracy in a study of insanity will be 
arbitrary, but it will be none the less necessary in order that the 
applicability of the conclusions may be determined. A study con- 
cerned with true variables does not escape the necessity of a stand- 
ard of accuracy. Suppose the problem is to determine the 
educational age of the children in a school — the educational age 
of a child is determined from the ratio of his school year to his 
chronological age. A child is eight years of age and is in the third 
grade. Shall we take his age in round years to the last birthday 
or to the nearest birthday? Or shall we express his age more 
exactly in years and months? If the child is being studied in 
December and he entered the third grade in September, should we 
use simply the whole number, 3, to express his grade, or should 
we conceive him to have moved from the round number to 3.3? 
This question must be decided before any data can be collected. 
When the standard of accuracy is decided, all the data must be 
collected with reference to this standard. These illustrations will 
serve to indicate what is meant by the standard of accuracy as a 
relative term. 

Secondary data have already been collected. They were gath- 
ered for some purpose by the original collector. This purpose may 
be different from the one actuating the person now concerned 
with them. The investigator will collect his data in this case by 
assembling the publications in which the data occur. He should 
then determine the standard of accuracy observed in their origi- 
nal collection. Whatever this was, the present investigator cannot 
change it. If it was not sufficiently $xact for his purposes, he cannot 
use the data. On the other hand, if he thinks the data are exact 
enough for his purposes, he can proceed to use them but must 
make only such inferences as the standard of accuracy would 



STATISTICAL ANALYSIS 


IOI 


warrant. He may manipulate the data in any way he wishes, but 
the form in which they are published will place some limitations 
upon him. For example, if his published data record ages in ten- 
year intervals, he cannot break them down and use five-year class- 
intervals, though he could add them and use twenty-year class- 
intervals. For this reason it is desirable that data, which are likely 
to be used by many investigators as secondary data, be published 
in the simplest form that anybody might want to use. 

2. COLLECTION OF PRIMARY DATA 

Some primary data may be collected through official agencics ; 
provided the investigator furnishes the report forms. This is a 
common occurrence when a public agency is interested in a piece 
of research being done by an outsider. The agency will agree to 
order the reports made in the form desired for their research 
project. In such cases the terms must be the simplest possible. If 
there is any question about the exact definition of terms, a list of 
definitions must be given to those who record the information. 
The tabulation card reproduced below is an example of a report 
form, where the meaning of the terms was so clear that no list of 
definitions was necessary. This card has spaces on the left-hand 
end for the information to be written in by a clerk of the Indian- 
apolis Department of Health which was cooperating with the 
author in this study of mortality. The Department of Health is 
asked to give the month of death, the age in years (date of birth is 
not always obtained on the physician’s certificate, but the age is 
given), sex, color, diagnosis, and census tract. This particular card 
is an example of simple machine tabulation, because the information 
to be punched is written on the card itself; this can be done, if the 
number of items is small and sufficient space is left on the card for 
recording the information. The department clerk should make no 
mistakes in transferring the information from the physician’s report 
to this card, because the terms are simple, objective, and capable of 
no misinterpretation. The only question of definition that ever arose 
was whether persons who lived out of the city but died in the 
city should be reported; these were not wanted, because this was 
a study of the mortality of inhabitants of the city of Indianapolis. 
All the deaths of residents of Indianapolis occurred in a census 
tract, or, if at a hospital, then the person had a residence in a 
census tract, and it was the residence tract that was wanted. 



ORTALITY 



STATISTICAL ANALYSIS 103 

The tabulation card reproduced on page 102 is more compli- 
cated. It could not be given to the juvenile courts for entering 
the information desired. A report form, embodying the same items, 
was printed and sent to the courts. But the terms are not as un- 
ambiguous as those in the mortality study. Every term had to be 
defined with precision ; a few terms are rather obvious in meaning, 
but some explanation was necessary in all cases. “Offense” was 
to be called by its legal name. “Disposition of Case” was to be 
stated specifically: if the child was sent to an institution, the name 
of the institution was asked for; the case might be dismissed, or 
damages for property ordered paid; the child might be put on 
probation, or the case might be unofficial, in which case the word 
“unofficial” was written into the form for disposition of the case. 
“Age” was to be given to the nearest birthday. “Weight” was to 
be expressed in pounds, but “height” was to be expressed in feet 
and inches. Etc., etc. Even with specific definitions provided to the 
probation officers of the courts, some question was continually 
arising about special cases, or some official could not understand 
what was meant. 

Every business or social agency, public or private, which collects 
information about its work is faced with the same problem. The 
information believed to be important and reportable must be asked 
for in as simple language as possible. Terms must be explained 
carefully and sometimes often. The collection of such data is the 
first element in social bookkeeping, and it is basic to statistical 
analysis. One of the most important functions of state or city de- 
partments of public welfare and departments of health is the 
collection of statistics. Some of these statistics are collected 
monthly, some quarterly, and some annually. The department 
usually has authority to prescribe the form of the report. If it has 
competent administrators, they want the reports to show the condi- 
tion of public welfare and health work in the state. This requires 
certain facts which must be requested in simple, objective form. 
The more questions that can be answered by “yes” or “no” or by 
numbers, the better the report form. Of course, careful definitions 
of the items reported in numbers are necessary in order to secure 
comparable data. The report form for which the juvenile delin- 
quency tabulation card (see Fig. II) was designed was adopted 
as the official monthly report form of the Indiana State Probation 
Department and prescribed as the form on which the courts should 
make their reports. 



SOCIAL STATISTICS 


104 

C. G. Form 19. Agent’s Report to Board. 


BOARD OF CHILDREN’S GUARDIANS OF COUNTY 

REPORT OF AGENT FOR THE MONTH ENDING 

1. Children Boarded with Mothers: Bovs Girls Total 

a. Placed during month 

b. Discontinued during month . 

c. Number remaining last day of month 

Number of mothers boarding their children . . 

2. Foster Homes: 

a. Applications for children: 

Number received 

Number investigated 

Number approved 

b. Children placed during month: 

By Board of Children’s Guardians 

By Board of State Charities 

c. Number of children in foster homes subject to visitation 

d. Number of children in foster homes visited during current month 

found to be getting on well 

e. Number dropped from rolls: 

Adoption 

Death 

Marriage 

Over age 

Others (specify) 


3. Institution Care: (Beginning of (End of 

Month) Month) 

Wards of the Board in the following named institutions: 


4. Summary of Wards for Last Day of Month: (End of Month) 

1. In mothers’ homes 

2. In foster homes 

3. In institutions or boarding homes 

Total 

5. Financial Statement: Number Expense 

a. Children boarded in own homes during month $ 

b. Children boarded in institutions during month $ 

c. Children boarded outside institutions $ 


Total amount contributed during month by parents for support of 

children 

6 . Agent’s Activities: 

Has every ward in mothers’ homes been visited during the month? 

If not, which have not been visited? 


Give reason 

Total number of visits to homes .... 
Total number of office interviews . . . 
7. Miscellaneous work (specif)): 


(Signed). 

Figure VII. — Report Form Used by the Boards of Children’s Guardians, 

Indiana 




STATISTICAL ANALYSIS 


105 


Because such a large proportion of social data are collected by 
official agencies and because many students who expect to take posi- 
tions as statisticians will be associated with public departments, two 
samples of official report forms are reproduced here. The first, re- 
produced above, is the monthly report form used by the agent of 
county boards of children’s guardians in Indiana, and the second is 
the form filled out for admission of patients to the out-patient de- 
partment of the Indianapolis City Hospital. 


Mnmp 

INDIANAPOLIS CITY HOSPITAL 

Out-Patient Department 

Form D7 

fW 

-In 




Age Race 


Address 





Ref. hy 

Reason for referring- 

No. in family. 


Adults 

Adults workiner 


Children in School 


Children under School ai?e 


Children workimz 


Denendents 

Occunations of those 

EmDloved 

Income from Father. 


Mother 

Children 


Others 

Expenses: Rent 


. Insurance 

Installments 


Food 

.Fuel 

Others 



Remarks: 


Signed- 

Figure VIII. — Registration Form 

Closely allied to the type of reports received by public agencies 
are the records kept by private social agencies. These agencies may 
not keep their records primarily for the purpose of reporting to 
a central collecting agency, but they keep records for their own 
use. A settlement house carries on a variety of activities, and the 
workers, as well as the board of directors, want to know periodi- 
cally what participation there has been in the different activities. 
A public health nursing association is interested in the number of 
different types of cases it handles, the cost of cases, and their 
location in the city. Its interest in statistics may be entirely admin- 
istrative $ there may be no definite interest in statistical research. 
But statistics are indispensable to the effective administration of 



io6 


SOCIAL STATISTICS 


the public health nursing association. Most such private agencies 
are related in some manner to a national organization, one of 
whose duties it usually is to develop standards, including standards 



friataf la U. 8. A. 

Figure IX. — Statistical Card 


of statistical reporting. For several years the Family Welfare 
Association of America has been experimenting with various statis- 
tical cards. This Association is concerned primarily with family 



STATISTICAL ANALYSIS 


107 


case work as treatment, but it is well aware of the desirability of 
reducing to statistical form all the data which lend themselves to 
such recording. Some of their data are qualitative and cannot be 
enumerated satisfactorily, but much of the useful information can 
be checked on a card. Then at the end of the year it is possible for 
the society to make a statistical summary of its work. A few of 
the larger societies are employing statisticians whose business it is 
to analyze the statistical data on the form card which each case 
worker keeps for each of her cases. 

The card which the Family Welfare Association is now recom- 
mending to its member agencies is reproduced below. 



Hi: 


Figure IX-A. — Reverse Side of Figure IX 

A questionnaire, technically defined, is a blank form mailed to 
the person who is expected to furnish the desired information. 
The response of the person interrogated is wholly voluntary. He 
may fill out the questionnaire and return it, or he may throw it in 
the wastebasket. A questionnaire may ask for information that is a 
matter of opinion and not capable of statistical expression. This 
kind of questionnaire is not under consideration here. Government 
bureaus and departments frequently do not have authority to com- 
pel the reporting of certain information which they want; in such 
cases they resort to the questionnaire method of collecting their 
data. This method is also used widely by private individuals and 
organizations having no official status. As stated above, the replies 



io8 


SOCIAL STATISTICS 


are always voluntary 5 and responses are usually received from 
only a small percentage of the questionnaires mailed out. Govern- 
ment bureaus using this method probably get a higher percentage 
of replies than do individuals or private organizations, because the 
citizen is likely to feel some obligation to respond to a request of 
the government. The questionnaire should be so framed as to 

FORM 25 1 

U. S. Department of Labor, Bureau of Labor Statistics, Washington 

Dear Sir: 

The Bureau of Labor Statistics is endeavoring to keep as accurate a record as possible 
of all strikes and lockouts in the United States as they occur. We shall, therefore, 
greatly appreciate your courtesy in furnishing as much as you can of the information 
listed below, relative to the strike or lockout here indicated. 

An Addressed envelope on which no postage is required is inclosed for your reply. 

Very Respectfully, 

Commissioner of Labor Statistics 
Schedule of Inquiry 

I. State 2. City or town 

3. (a) Industry (b) Occupation 

4. Strike or lockout? 

5. Name of establishment (if more than one, give number). 

6. Date of beginning 7. Date of ending 

8. Number of employees involved. Male Female Total- 

r 9. Cause or object, briefly stated 

10. Result, briefly stated 

11. If ordered by a labor organization, please give name- 

12. If settled by arbitration, please name Board 

13. If terminated by a written agreement between employer and employees, will you 

kindly inclose a copy of the same? 

1 United States Bureau of Labor Statistics, Methods of Procuring and Computing 
Statistical Information of the Bureau of Labor Statistics y Bulletin No. 326, 1923, p. 38. 

Figure X. — Questionnaire of the U. S. Bureau of Labor Statistics 

make the person feel he has some interest in the subject investi- 
gated. This may be done by an explanatory note at the top or at 
the bottom, or in a letter. As few questions as possible should be 
asked. A short questionnaire may bg filled out in a few minutes by 
the person who has the information. On the other hand, some 
questionnaires contain several pages and dozens of questions which 
would require several hours of work to answer conscientiously, and 



STATISTICAL ANALYSIS 


109 


few people will trouble to fill them out. If 10,000 questionnaires 
are mailed and only 1,000 are returned, there is always serious 
doubt whether the returns are sufficiently representative to be 
worth anything. For statistical purposes, the questions should lend 
themselves to “yes-or-no” answers or to answers in figures; opin- 
ions should be excluded, because they are non-statistical in nature. 

Two samples of questionnaires are given to illustrate the method 
of asking questions. 

Fig. X is put out as an official government form, but it is 
really a questionnaire in view of the facts that it is mailed to the 
person who is to furnish the information and that the response is 
voluntary. The purpose of it is stated in a short letter at the top 
of the questionnaire. The questions are few and require~simple, 
objective answers. Questions 9 and 10 are the only ones in any 
way involving matters of opinion, and usually both the employer 
and the employees in a strike or lockout have a reason that can 
be stated briefly. The questionnaire is mailed to both parties to the 
strike or lockout. If there is disagreement as to the cause or result, 
further inquiry can be made. 

The next questionnaire is also put out by the Bureau of Labor 
Statistics in connection with its current statistical record of indus- 
trial accidents. This form asks for information bearing on the 
amount of exposure of the employees to possibility of accident: 


FORM 26 1 

Report of Employment 

Company Plant Year, 



Total Hours 

If Total Hours Are Not Available, 
Report as Below 

Department 

Worked by All 
Men as Shown 
by Time Books 

Average 

Number 

Employed 

Days De- 
partment 
Was in 
Operation 

r Jsual 
Length 
of Day 
or Turn 


1 




1 Op. cit. y p. 39. 





Figure XI. — Questionnaire of the U. S. 

Bureau of Labor Statistics 



I 10 


SOCIAL STATISTICS 


Another form is used for obtaining the number of persons injured 
and the amount of disability. Form 26 enables the Bureau to 
compute the liability to accidents, and with the other data obtained 
on the next form in its series (Form 27) it can estimate the in- 
crease or decrease of industrial accidents over a period of time. 

The survey schedule is similar to a questionnaire, but it is used 
differently and may be more complicated. A field worker takes the 
schedule and interviews the person who is to give information. 
The form used by the Government to take the national census is 
in fact a survey schedule, though it is not referred to as such. 
Surveys of farm houses have been made by the United States 
Bureau of Agricultural Economics. This Bureau is continually 
directing surveys in different parts of the country in connection 
with its studies of farm production and the standards of living of 
farmers. The land grant colleges carry on numerous surveys of 
rural communities or counties, or even surveys of some aspect of 
rural life on a state-wide basis. One of the most widely known 
urban surveys was the Pittsburgh Survey made in 1909-16. This 
survey was made with particular reference to the standard of 
living and working conditions of the industrial workers in and 
around the city of Pittsburgh. Recently the Russell Sage Founda- 
tion has published a directory of over two thousand social surveys 
which have been made in different parts of the United States. 1 In 
probably all of these the survey schedule has been an important 
means of recording the data necessary to analyze the problems 
under consideration. Certainly it is true of those that were care- 
fully planned and executed. “The schedule used by the field 
worker,” says Chapin, “is a mechanical device which is designed to 
provide him with a method of limiting or controlling his observa- 
tion and of standardizing the method of recording that observa- 
tion. In so far as inquiries on the schedule are put in a form which 
can be answered by a numerical or quantitative statement or by 
‘yes’ or ‘no, 1 the subjective characteristics of the field worker which 
may bias his opinion are eliminated.” 2 Chapin gives detailed de- 
scriptions of field work procedures in this work. The questions 
must be framed carefully so that they elicit nothing but objective 
replies, as Chapin suggests. The fact that a field worker carries 
the schedule and obtains answers to the questions on the schedule 

1 Eaton, Allen, and Harrison, Shelby M., A Bibliography of Social Surveys. 
Russell Sage Foundation, 1930. 

2 Chapin, F. S., Field Work and Social Research, pp. 49, 50. New York: The 
Century Co., 1920. 



STATISTICAL ANALYSIS 


1 1 1 


by talking with persons who are familiar with the facts makes this 
method of securing information more reliable than the question- 
naire method, and it insures responses from a much higher per- 
centage of persons. The questions asked in a questionnaire, if at 
all complicated, are open to as many interpretations as there are 
persons replying, whereas, if the field worker has some bias which 
careful formulation of the schedule cannot entirely nullify, all 
schedules have the same bias. 

A good schedule used for the study of a social condition or 
situation reduces considerably the necessity of having highly 
trained field workers. If the investigator knows what he wants and 
if he wants something that can be objectively defined and studied 
by means of objective facts alone, he can organize a staff- of un- 
trained workers to gather the material. This is regularly done 
every ten years by the Bureau of the Census which conducts the 
most comprehensive of all surveys, the enumeration of the com- 
position and characteristics of the population of the nation. In 1930 
the Committee on Compensation for Automobile Accidents, under 
the auspices of Columbia University, conducted a survey of persons 
injured in 1928 and 1929 in automobile accidents in several differ- 
ent states. A few trained workers were used to direct the work in 
each locality, but much of the calling upon families was done by 
college students who had had no experience in making surveys. 
This was possible, because the questions asked for simple matters 
of fact, and all the field worker had to do was to be reasonably 
courteous, enlist the interest of the persons injured or their rela- 
tives, and record the answers to the questions on the schedule. 

Two schedules are reproduced below to illustrate the kind of 
questions that should be asked and the manner of asking them. 
The schedule used by the Committee on the Study of Compensa- 
tion for Automobile Accidents calls for a great deal of informa- 
tion. Most of the questions could be answered with a high degree 
of accuracy. In practice it was found that the questions concerning 
the expenses of the injured person could not be given precise an- 
swers, and resort had to be made to estimates of the amounts 
under different headings. The reliability of the data depended 
upon the ability and willingness of the injured person or some 
member of his family to answer the questions. The field workers 
rarely found any difficulty in getting him to talk. The schedule 
suggests another thing: that is, the complexity of such an appar- 
ently simple social problem as compensation for automobile acci- 



112 


SOCIAL STATISTICS 


SCHEDULE FOR THE STUDY OF COMPENSATION FOR AUTOMOBILE 

ACCIDENTS 

I. File # 2 . Date of accident 

3. Name Age Sex_ 

4. Address 

5. Type of accident 

6. Injured was: pedestrian owner driver driver. 

or a passenger who was: owner 

member of owner’s family- 
member of driver’s family- 

guest of owner 

guest of driver 

7. Driver was: owner owner’s friend owner’s chauffeur- 

member of owner’s family renter 

8. Fatal: immediately , after hours, days, weeks 

9. Injury 


10. Date of investigation Injured is M S W D 

11. Occupation when hurt Earnings $ week 

12. Customary occup. during previous year Earnings $ week 

13. When struck injured was: on way to or from work 

out for recreation stealing a ride- 

other (state explicitly)- — 

14. Injured was struck by: hit and run driver intoxicated driver 

out of state car stolen car 

15. Disability: 

In hospital: emergency treatment only; days, weeks 

No disability temporary able to resume regular 

duties days weeks after accident. 

Permanent total (state injured’s condition) 

Permanent partial: period of temporary total disab. 

Injured’s permanent condition 

Earnings since accident $ week. 


Figure XII 





STATISTICAL ANALYSIS 


13 


16. Expenses: (If any treatment was free, please indicate) 


Hospital $- 

Medical (doctor, nurse, drugs, 
X-ray, etc.) $. 

Wages of substitute for injured $. 

Lost wages 

Funeral $- 

Property damage $. 

Other $. 

Total |- 


*— 

$— 
%— 
$— 
$— 
% — 
% — 
% — 


Paid 

- by. 

- by. 

- by. 

- by. 

- by. 
^ by. 

- by. 


17. Compensation: 

Vehicle which struck injured was insured not ins not known — 

Vehicle in which injured was riding was insured not ins not known. 

Obtained by verdict 

Settlement, through efforts of attorney, with Ins. Co._ 

driver not known. 

Direct settlement with Ins. Co owner driver. 

Received from Workmen’s Compensation Fund 

Compensation received days/weeks after accident. 

Total recovery % 

Injured received % less % paid toward expenses. 

Attorney received $ less $ paid toward expenses. 

From Work. Comp. Fund $ per week for weeks. 

18. Pending: 

Sui 

Negotiations with Ins. Co owner driver—. 

Recovery likely: yes no 

19. No Recovery: 

Nothing sought by injured. 

Claim refused by attorney— 

Claim ignored by owner by driver- 

20. Reasons for No Recovery: 

Injured was struck by a Gov’t, or City vehicle 

Party II had “influence” at hearing 

Financial irresponsibility of Party II. 

No witnesses 

Contributory negligence of Party I_ 

Minor injury 

Lack of funds to initiate proceedings. 


Figure XII. — Continued 



SOCIAL STATISTICS 


Ignorance of recovery procedure. 

“Just did not bother” 

Other 


21. Received Aid From: 

Life Ins. $ 

Benefit Societies $ 

Other 1 

22 . Family Situation: 

Relationship Age 


Accident Ins. $ , 

. per week for weeks 


Occup. Salary Contrib. to family income 


House rent $ a month. House owned clear 

Buying house, payments $ a month 

Lodgers: pay $ a week. 

Boarders: pay $ a week. 

23. Effects of accident: 

Wife or mother went to work_ 

Family borrowed money $ from_ 

Used savings $ Other 

24. Has injured ever been in a m.v. accident before ? How often L. 

As driver As passenger 

25. Short story showing type of home and condition of family. (Use reverse side of this 

sheet if necessary) 

Interviewed Investigator's initials 

Figure XII. — Continued 

dents. Because of this complexity, the framing of the schedule 
required much time and discussion before it was finally adopted. 

The next schedule was used for one part of a study of dependent 
and delinquent children in North Dakota and South Dakota by 
the United States Children’s Bureau. This schedule was used in 
the study of children under the care of institutions and agencies: 
Notice that when an answer requires opinion, such as “interest of 
relatives,” an objective indication of interest is suggested, namely, 
the frequency with which the parents visit the child. Physical, 
mental, and behavior characteristics ‘are less exactly definable than 
are some other facts j so the field worker was to indicate whether 
the answer was a result of examination by a qualified person or 



STATISTICAL ANALYSIS 


ii5 

not. The implication is that, if not by examination, then the 
diagnosis is less trustworthy. 

I. CHILDREN UNDER CARE OF INSTITUTIONS AND AGENCIES 1 

Schedule No. 

Institution or agency Date 

Name of child Sex Race 

Date of birth Age Birthplace 

Date received Source Perm, or Temp. Care- 

Nationality — Father Mother 

Reason for receiving 

Maintenance: by State County Family- 

Agency (specify) Other 

Length of time in inst. or under agency care (dates, etc.) 

Disposition (specify in chronological order, placements, parole, released, adoption, 
boarding, with relatives) 


Interest of relatives (frequency of visits to inst.; agency visits to child’s original home 
and present home, etc.) 


Family and home conditions: (check correct answer) 

Mother — dead, living, married, widowed, divorced, separated, deserted 
Father — “ “ « “ . “ 

Economic conditions 


Other- 

Child’s characteristics: 

Physical 

Mental 

Behavior 

Child’s social history: 

School attendance. 


(Specify if examination) 


Dependency 

Delinquency 

Present address of child Removed from. 

Present address of parents 

1 U. S. Children’s Bureau, “Dependent and Delinquent Children in North Dakota 
and South Dakota,” Publication No. /. bo P. 124. 

Note: As published in the bulletin the schedule was rather condensed. It has been 
expanded here to somewhat the form which the field worker might use. 

Figure XIII. — Schedule Used in a Child Welfare Study 


In connection with forms for securing information for later 
statistical analysis, the score-card should be mentioned. It is a sort 
of schedule, but is extremely simple and not susceptible of mis- 



1 1 6 SOCIAL STATISTICS 

interpretation. All the questions are answered by “yes” or “no.” 
No answers are checked. Each item of the score-card is assigned 
an arbitrary weight so that a measure of the relative importance 
of the items may be obtained and used in the analysis of the 
problem. 


3. ASSEMBLING DATA 

After the data have been collected on questionnaires, schedules, 
or official forms, they must be assembled for analysis. Tabulation 
is not a single step but includes everything from assembling the 
data punched on a tabulation card or tallied on a sheet of paper 
to the final form of frequency or other tables. In this chapter we 
are concerned only with the preliminary step, namely, assembling 
the data punched or tallied on a work sheet (frequency distribu- 
tions and setting up tables will be discussed in the next chapter). 
The tabulation card and the machines have already been described 
(see pp. 101-106). There are other machines which record classifi- 
cations, totals and sub-totals by means of a printing device. The 
machine arranges the cards in any order desired, but, with the sort- 
ing machine alone, the operator has to count the cards either 
by hand or on the machine. Then the total items are recorded on 
a work sheet. For example, in the problem of crime discussed in 
a previous chapter, one of the things wanted was the number of 
criminals living in each census tract of the city. Each of the 108 
census tracts had a symbol on the card and was punched. If the 
machine is set on column 7 (see p. 85), it arranges all the cards 
in numerical order for units .place. After they are run through, 
the cards are gathered up from the pockets of the machine, one 
pocket having all cards with the figure 1 in units place, another 
those with figure 2, etc. They are kept in order, placed in the 
machine, which is set on column 6, and they are run through 
again. Now they are in order for both units and tens places. Once 
more gathering the cards up from the pockets, the machine is set 
on hundreds place, and they are run through again. Now the cards 
are arranged according to census tracts from 1 to 108. The operator 
may count the cards for each tract by hand, or, if a large number 
of cards come in one tract, they may be counted on the machine. 
A work sheet has been prepared wfth the tract numbers arranged 
vertically on the left-hand side, and spaces to the right are left to 
record the number of items in each tract. Ages may be tabulated 



STATISTICAL ANALYSIS 


ll 7 

in the same way. The following is a work sheet for recording resi- 
dences and places of offense: 


Tract 

Criminals Living in 
Specified Tract 

Offenses Committed in 
Specified Tract 

i 

4 

4 

2 

6 

6 

3 

i 

i 

4 

2 

4 

5 

I 

I 

6 

0 

5 

7 

5 

4 

8 

9 

o 

9 

2 

i 

IO 

5 

4 


Figure XIV. — Work Sheet for Assembling Crime Data Sorted 
on a Tabulating Machine 


This work sheet does not differ from some tables. If some other 
items were tabulated, they could be given in any detail desired, 
and then tables could be made up to group them in different ways. 

Hand tabulation would be different and more laborious. Punch 
cards would not be used at all. The worker would tabulate directly 
from the schedule, questionnaire, or official report to the work 
sheet. Sometimes the data are transferred to small cards, substi- 
tutes for machine cards, for convenience in hand sorting. The 
following work sheet will illustrate this procedure: 


Tract 

Criminals Living in 
Specified Tracts 

Offenses Committed in 
Specified Tracts 

i 

IIII 

mi 

2 

Tm i 

'Tl+L. I 

3 

i 

I 

4 

n 

IIII 

5 

i 

I 

6 

o 


7 

THl 

IIII 

8 

TH4, IIII 

O 

9 

II 

I 

IO 

TML 

IIII 


Figure XV. — Work Sheet for Assembling Crime Data — Hand 
and Tally Method 


This method of transferring the individual items from the sched- 
ule or questionnaire to a work sheet, which is the first step in 
tabulation, is called tallying in. When only a small number of 



1 1 8 


SOCIAL STATISTICS 


items are involved, this method is satisfactory. The larger the 
number of items to be tabulated, the more time-consuming and 
expensive it is. But machines are not always available to students 
or research workers, whereas this method can always be followed. 

4. EXERCISES 

1. Take a published piece of research, selected by the instructor or 
by the student, read it carefully, and list the steps in the pro- 
cedure from the formulation of the project to the written 
report. 

2. Draw up a form for an official report: 

(a) For a probation officer who has to assemble information 
for the juvenile court judge on a child who is to appear 
in court. 

(b) For the principal of a school who has to report to the 
superintendent attendance for the week at her school. 

(c) For a public poor relief official who has to report his cases 
monthly to a board of commissioners. 

3. Draft a questionnaire to be sent to ministers in connection with 
a study of religious education. 

4. Draft a schedule for a survey: 

(a) Of housing conditions in your city or community. 

(b) Of boys selling papers on the street. 

(c) Of children attending neighborhood motion picture shows. 

(d) Of delinquent girls brought before the juvenile court in a 
certain year. 


5. REFERENCES 

Chaddock, R. E., Principles and Methods of Statistics, Chap. XIV. 
Chapin, F. S., Field Work and Social Research, Chaps. Ill, IV, 
VII. 

Lundberg, G. A., Social Research, Chaps. VI, VII. 

Schluter, W. C., How to Do Research Work . 



CHAPTER VI 


Tabulation of Statistical Data 


I. TABULATION AND CLASSIFICATION 

Tabulation has two meanings: first, the transfer of data from 
original schedules or reports to a work sheet or a machine card; 
and, second, the arrangement of data in tables. The first use of the 
term is due largely to the introduction of machines which were 
called by the manufacturers tabulating machines. Some of these 
machines do actually print summaries of the data as the cards are 
sorted, but the sorting machines simply arrange the punched cards 
in order, and the totals have to be written down by hand in some 
form of a table. In order to distinguish these two processes, the 
first one was discussed in the preceding chapter and referred to as 
assembling statistical data. The second kind of tabulation is the 
subject of the present chapter. 

Logically tabulation is the fourth step in the study of a statistical 
problem for which data have been gathered by the investigator. 
Classification is the first step. This has to be roughly done before 
the schedule, questionnaire, or report form can be devised. For 
example, it is proposed to study the distribution of felonious crimes 
in Indianapolis. What classes of data are required to describe the 
distribution? Whatever data are needed for this purpose must be 
asked for in the schedule or report form. Distribution may refer 
to geographical distribution of all felonies without regard to type 
of offense, or it may refer to the distribution of the residences of 
the criminals only. On the other hand, it may refer to distribution 
of felonies by type of offense and by place of offense, or distribu- 
tion may be by age, sex, race, nationality, and time also. In the . 
study made in Indianapolis for 1930 distribution was conceived in 
terms of types of offense, residence of the offender, place of the 
offense, age, and sex. These were subclassifications of data under 
the general class, distribution of crimes. The schedule was drawn 



120 


SOCIAL STATISTICS 


up accordingly. The second step was collection of the required 
data. The third step was punching the information on machine 
cards and then assembling it on work sheets. The fourth step was 
tabulation. 

Four general classifications are used in social statistics: chrono- 
logical, geographical, magnitudinal, and qualitative. In Chapter 
III a dichotomous division of all statistical data was made, namely, 
attributes and variables. Chronological classes are usually variables, 
but not always so. Geographical classes are frequently not varia- 
bles, but are attributes determined by political considerations. 
Magnitude classes are always variables. Qualitative classes are 
never variables; they are attributes, the definitions of which may 
be sufficiently precise for the profitable application of statistical 
methods of analysis. Of course, any number of subclassifications of 
the four main classifications mentioned above may be made 5 the 
number will depend upon the purpose in the mind of the investi- 
gator. The important point here is that the process of classifying 
the data will be almost complete long before the stage of tabulation 
is reached. If the data are in sufficient detail, they may be recom- 
bined in various ways to give new classes at the time of tabulation, 
but this too precedes the construction of tables, though it may 
come after the collection and assembling of the data. 

2. CONSTRUCTION OF TABLES 

A table is drawn on a flat surface, generally rectangular in form, 
ruled according to the requirements of the data. But certain steps 
are to be taken before the ruling is done. The worker must decide 
what captions are necessary and what is to be represented in the 
stub of the table. But care should be taken in thinking out the 
captions and stubs so as not to make the table too elaborate. A 
table is, after all, designed to simplify and summarize, and this 
purpose is defeated when it becomes too complex. This point can 
be discussed best from the table below for purposes of clarity. 
The captions are the headings in the spaces at the top of the table; 
they indicate the nature of the data contained in the columns. The 
“Year” and “Leather and Its Products” are the major captions, 
and they are coordinate. “Group Index,” “Leather,” and “Boots 
and Shoes” are captions subordinate to “Leather and Its Products.” 
That is, they are subdivisions of the major caption, but they are 
coordinate with respect to each other. “Employment” and “Pay- 
roll Totals” are captions subordinate to the subdivisions of 



STATISTICAL ANALYSIS 


1 2 1 


TABLE III 

Indexes of Employment and Pay-Roll Totals in Manufacturing Industries 
Concerned with Leather and Its Products, Yearly Averages, 1923 to 1929 1 


Year 



Leather and Its Products 



Group Index 

Leather 

Boots and Shoes 


Employ- 

ment 

Pay-roll 

Totals 

Employ- 

ment 

Pay-roll 

Totals 

Employ- 

ment 

Pay-roll 

Totals 

1923 

. no. 7 

113-9 

109.6 

107.0 

hi .1 

117.0 

1924 

100.3 

100.6 

96.9 

95-7 

101 .6 

102.8 

1925 

101.9 

101 .8 

98.7 

97-5 

102.9 

103.6 

1926 

. 100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

1927 

97-9 

97-4 

98.4 

97.2 

97-7 

97.6 

1928 

92.8 

89.7 

95 4 

93-7 

91.9 

88.0 

1929 

92.8 

89.9 

92.2 

93 

92.9 . 

- 89.0 

1 Monthly Labor Review , Vol. 30, No. 2, p. 186. The data here 

reproduced 

are taken 

from a larger table. 






“Leather and Its Products.” 

The stub , 

or the first column 

in the 


table, gives the second variable: time. “The units in which the 
measurements are made,” says Secrist, “generally, although not 
always, appear in the ‘caption 5 : that is, in the vertical classes. The 
ways in which the measurements are presented generally, although 
not always, appear in the ‘stub 5 — the horizontal classes. A tabu- 
lated datum, therefore, is found at the intersection of the vertical 
and horizontal axes . 551 A table, therefore, has two dimensions: 
vertical and horizontal. The characteristics of the data are given 
in the vertical dimension, or the columns, and the point of view 
from which they are regarded in the horizontal dimension, or the 
rows. A table thus in some degree presents an analysis of the 
data. Too much care cannot be given to the determination of cap- 
tions and their relations of coordination and subordination $ the 
clarity of the table depends upon this process. 

Another point to be kept in mind is that coordinate captions 
may be both general and specific. “Group Index 55 is a general 
caption covering employment and pay-rolls in all leather and 
leather goods factories, but the coordinate captions, “Leather 55 and 
“Boots and Shoes , 55 are specific j they are included in the group 
index but are separated for detailed analysis and presentation. 
When this kind of tabulation is necessary, the general class of data 
should be given in the first column to the right of the stubj the 
specific data are then in columns to the right of the general class. 

1 Secrist, Horace, An Introduction to Statistical Methods , pp. 128, 129. New 
York: Macmillan, revised edition, 1929. 



122 


SOCIAL STATISTICS 


This is a matter of convenience for two reasons: first, any reader 
will be interested in getting a general picture of the problem, 
before he goes to details j second, probably more people are inter- 
ested in the general aspect alone than in both general and specific 
aspects. Furthermore, as a technical matter, it is the accepted mode 
of tabulation among statisticians. 

A similar procedure is observed, when the totals of columns are 
published. The following table shows this fact: 

TABLE IV 

Poor Asylum Inmates Classified by Age and Sex, August 31, 1929. 

Indiana 1 


Age Group Both Sexes Male Female 


All Ages 4,156 2,904 1,252 


Under 3 years 6 4 2 

3 and under 17 9 4 5 

17 and under 30 75 35 40 

30 and under 45 317 178 1 39 

45 and under 60 832 556 276 

60 and under 75 1,616 1,195 421 

75 and over 1.264 908 356 

Age not given 37 24 13 


Arranged from data in the Indiana Bulletin of Charities and Corrections , No. 1 82, p. 302. 


The totals are given for “Both Sexes” and for “Male” and “Fe- 
male” at the top of the table. This enables a reader to see at a 
glance the number of persons who were given care in the poor 
asylums, which is often the only fact wanted by a reader. This 
table exhibits again the general caption with specific captions which 
are placed to the right of the general caption. 

The title of the table is important. It should be brief but should 
indicate the main facts given. It is not necessary that the title be 
a complete sentence \ few titles of tables in standard statistical 
publications are complete sentences. The title of Table III does not 
attempt to give the details presented in all columns: it gives the 
general characteristics of the subject, namely, indexes of employ- 
ment and pay-rolls in leather and leather-products industries 5 and 
the viewpoint from which they are presented, namely, the years 
1923 to 1929. 

The ruling of the table is determined to a large extent by the 
relations of the various captions. If vertical lines are used, they are 
dropped from the horizontal line which underlines a more general 



STATISTICAL ANALYSIS 


123 


caption. The line which separates the stub from the columns to the 
right is dropped from the topmost horizontal line. The topmost 
horizontal line is either one heavy line, or a double line. Some 
authors draw a double horizontal line between the lowest caption 
and the data in the columns. If this is done in tables which have 
totals, as Table IV, the double line is below the row of totals. The 
best practice regarding the bottom of the table is to draw either a 
single or a double horizontal line. If the bottom is left without a 
line, the table has the appearance of incompleteness. The ends of 
the table are generally left open, though some authors prefer to 
enclose the whole table by using end lines. 

Footnotes to tables are generally placed in small type immedi- 
ately below the table. They may be placed at the bottom of the 
page, but it is more convenient to place them nearer the table. 
The footnote may be only for the purpose of giving credit to the 
source from which the data are taken, or it may be to explain 
some unusual variation in the data which might escape the reader 
or might even be impossible for him to discover from the table 
at all. Everything about the table should be perfectly clear to the 
reader without the necessity of his debating in his mind the mean- 
ing of the author. 

Tables may be classified as to whether they serve a general or 
a specific purpose. This distinction is important, when the worker 
prepares his table, because the users of the two types of tables are 
not the same. Discussing this subject, Mudgett says, “The descrip- 
tive terms used indicate the difference between the two types, the 
general-purpose table being designed as a repository of the tabu- 
lations in full detail j whereas the analysis table [or special-purpose 
table] is intended, as the name suggests, to present the results of 
analysis, to give not necessarily or always full detail, but sum- 
maries or conclusions and significant relationships .” 2 The tables in 
the decennial publications of the United States Census are general- 
purpose tables. They are intended for thousands of persons whose 
interests in them vary widely. Public health statisticians want to 
know the details of age distribution so that they can calculate 
specific birth and death rates. Business men want to know the 
changing population by geographical areas so that they can esti- 
mate the future of their business in different parts of the country. 
Educators and social workers want to know the details about 

a Mudgett, Bruce D., Statistical Tables and Graphs , p. 30. Boston: Houghton 
Mifflin Co., 1930. 



124 


SOCIAL STATISTICS 


school attendance and child labor. Students of population want to 
know the details of age groups by sex and the division into rural 
and urban population. The detailed data of the census tables may 
be rearranged into less detailed groupings, but if published in 
large groupings they could not be broken down into details. Many 
statistical reports of federal, state, and city departments publish 
general-purpose tables so that their data may be of the widest 
possible use. The special-purpose table may give only averages, 
percentages, or coefficients of correlation, or it may give the orig- 
inal data in frequency classes suitable to the purpose in hand, but 
too general for the use of many other workers. The special-purpose 
table, as Mudgett suggests, presents the results of analysis and 
conclusions. 

Many people dislike to read a book or an article containing statis- 
tical tables. It appears to them formidable. For popular purposes 
the book or article without statistics has its place, but for the part 
of the public interested in knowing the facts about a subject statis- 
tical tables are essential. They enable the statistician to present his 
findings in brief space. How many pages of text would it take to 
present the facts brought out in Table III above? It would take 
quite a number, and, when the text was written, the reader would 
not have as clear an idea of the facts as he can now get in a few 
minutes’ study. The table is indispensable for the presentation of 
masses of data, and the student should become accustomed to 
reading tables as a matter of course, and he should learn to think 
of his own data in terms of tables. 

3. THE ARRAY 

As data appear on a work sheet, they are unorganized. The 
student can have no idea of their meaning, until they are arranged 
in some orderly manner. Likewise published data may have a di- 
rect bearing upon a problem, but may not be in the order required 
for the purpose in hand. They must be reorganized to satisfy the 
requirements of the problem under consideration. Table V gives 
the number of jail prisoners per 100,000 population in Indiana, 
October 1, 1928, to September 30, 1929, according to counties. 
It is obvious that the arrangement of counties in alphabetical order 
has no significance in so far as the occurrence of imprisonment in 
jails is concerned. No conception of the average rate of such 
imprisonments can be obtained from this table. 



STATISTICAL ANALYSIS 

TABLE V 


Jail Prisoners per 100,000 Population in Indiana by Counties, October i, 1928, 
to September 30, 1929 1 


County 

Prisoners per 
100,000 Pop. 

County 

Prisoners per 
100,000 Pop. 

Adams 

.389 

Lawrence 

2,348 

Allen 

812 

Madison 

1,412 

Bartholomew 


Marion 

1,614 

Benton 

493 

Marshall 


Blackford 

1.044 

1 , 690 

Martin 

850 

Boone 

Miami 

1,955 

Brown 

1,856 

Monroe 

2,637 

Carroll 

543 

Montgomery 

1,879 

Cass 

1,817 

Morgan 

1,347 

Clark 

2,108 

Newton 

893 

Clay 

622 

Noble 

- 388 

Clinton 

902 

Ohio 

1,968 

Crawford 

358 

Orange 

1 ,070 

Daviess 

1,052 

Owen 

952 

Dearborn 

1,530 

Parke 

836 

Decatur 

999 

Perry 

887 

Dekalb 

689 

Pike 

623 

Delaware 

2,180 

Porter 

1,253 

Dubois 

285 

Posey 

1,092 

Elkhart 

487 

Pulaski 

1,344 

Fayette 

2,525 

Putnam 

2,010 

Floyd 

l»55l 

Randolph 

878 

Fountain 

708 

Ripley 

324 

Franklin 

56 

Rush 

1,061 

Fulton 

St. Joseph 

1,256 

Gibson 

744 

Scott 

894 

Grant 

2,275 

Shelby 


Greene 

561 

Spencer 

665 

Hamilton 

948 

Starke 

904 

Hancock 

3,637 

Steuben 

981 

Harrison 

433 

Sullivan 

1,497 

Hendricks 

978 

Switzerland 

497 

Henry 

2,477 

Tippecanoe 

l,45i 

Howard 

i,547 

Tipton 

784 

Huntington 

935 

Union 

1,494 

Jackson 

515 

Vanderburgh 

51 

Jasper 

1,215 

Vermilion 

I,U3 

Jay 

475 

Vigo 

3,826 

Jefferson 

843 

Wabash 

871 

Jennings 

368 

Warren 

793 

Johnson 

1,270 

Warrick 


Knox 

i,533 

Washington 

831 

Kosciusko 

503 

Wayne 


Lagrange 

634 

Wells 

513 

Lake 

1,158 

White 

383 

Laporte 

792 

Whitley 

659 


1 Rates computed from data of Indiana Bulletin of Charities and Corrections , No. 182. 
PP- 307* 308. 



126 


SOCIAL STATISTICS 


The simplest form of orderly arrangement would be in the 
form of an array, that is, listing them in order of magnitude from 
lowest to highest. Table VI presents the jail rates as an array with 
the names of the counties omitted: 

TABLE VI 


Jail Prisoners per 100,000 Population in Each County of Indiana, October i, 
1928, to September 30, 1929, Arrayed According to Rate 



Jail Prisoners per 

100,000 Population 


51 

659 

952 

1.530 

56 

665 

978 

1.533 

28s 

689 

981 

1.547 

32 4 

708 

999 

1. 55 i 

358 

744 

1 ,044 

1,614 

1,626 

368 

784 

1 ,052 

383 

792 

1 ,061 

1 ,690 

388 

793 

1 ,070 

1,817 

389 

812 

1 ,091 

1,856 

433 

814 

1 ,092 

1,879 

475 

831 

1 , 1 13 

1,955 

479 

836 

1,158 

1,968 

487 

843 

1,215 

2,010 

493 

850 

1.253 

2,108 

503 

871 

1,256 

2,180 

513 

878 

1,270 

2.275 

2,348 

515 

887 

1.344 

543 

893 

1.347 

2,477 

561 

894 

1 ,412 

2,525 

600 

902 

1 . 451 

2.637 

622 

904 

1 .494 

3.637 

623 

634 

935 

948 

1.497 

1 . 515 

3,826 


From the array it is easy to see that the rates below 1,000 pre- 
dominate and that there are few counties with rates of over 2,000. 
Two extremely low rates and two extremely high rates appear. 
The two lowest rates are proportionately so much lower than the 
next highest that it is probable some extraneous factor in recording 
and reporting is responsible for the difference. The two highest 
rates are not so much higher than the rates just below them to 
appear impossible. The array shows up still better in graphic 
form. Figure XVI presents the above data graphically. 

Examination of this figure reveals the wide range from the low- 
est to the highest rates of jail imprisonment. Possible explanations 
of the large differences are many: ( 1 ) there may be real differences 
in the tendency to crime and delinquency in various counties j 
(2) there may be wide differences in the strictness with which the 



STATISTICAL ANALYSIS 


12 7 


law is enforced j (3) some communities may permit bail more 
easily than others 5 (4) differences in reporting jail imprisonments 
may account for some differences. It is obvious that the array of 
imprisonment rates of counties does not explain why differences 
occur but simply makes clear that they exist. One of the functions 



of statistics is to reveal similarities and differences in masses of 
data. 

The reader will be aware of questions to which he would like 
to have answers, which statistics can answer but which are not 
answered by the array alone. Around what rate of imprisonment 


128 


SOCIAL STATISTICS 


do the rates tend to cluster? If the array is divided into parts with 
equal ranges, in what part do the largest number of rates appear? 
The array cannot answer such questions. That is the function of 
the frequency distribution, to which we shall now turn. 

4 . THE FREQUENCY DISTRIBUTION 

The frequency distribution is defined by Chaddock as follows: 
“An arrangement of quantitative data in order of magnitude, 
grouped by a selected class-interval of value so as to reveal clearly 
the internal structure of the mass of facts for the purpose in view, 
and so as to be accurate and useful for purposes of summarization, 
comparison, and analysis.” 3 If the frequency distribution is to do 
all this, that is, reveal the internal structure of the data and be 
useful for summarization, comparison, and analysis, it must be 
carefully constructed. It is a fundamental process in statistical 
analysis. Table VII presents the data of Table VI in the form of 
a frequency distribution: 

TABLE VII 

Frequency Distribution of Jail Imprisonment Rates 
According to Counties 


Rate 


Number of 
Counties 


All 


9i 


Under 500 14 

500-999 36 

1.000- 1,499 18 

1.500- 1,999 12 

2.000- 2,499 7 

2.500- 2,999 2 

3.000- 3,499 o 

3,5°o-3>9 99 * 


The concentration is in the range from 500 to 9995 more than a 
third of all the counties have these rates, and less than half have 
rates greater than 999. In view of this fact it would be interesting 
to know why a small number of counties have rates much greater 
than the lower half, but all these statistics can do is to raise this 
question. 

It will be noticed in Table VII that the rates are grouped in 
intervals of 500. All counties with rates less than 500 are put in 
the class-interval, 0-500 ; all the counties with rates of 500 but 

* Op. cit., p. 57. 



STATISTICAL ANALYSIS 


129 


less than 1,000 are put in the class-interval, 500-999} etc., etc. 
The number of counties whose rates fall within the limits of a 
class-interval is known as the class-frequency. In view of the fact 
that the frequency table is intended to convey some idea of the 
central tendency, or average magnitude, of the data, the size of the 
class-interval becomes important. Looking at the second class- 
interval of the table and noting that 36 counties have rates be- 
tween 500 and 999, one almost automatically thinks of the average 
rate of this class-interval as 750, that is, the mid-point of the 
class-interval. In an even distribution that is a fact, and it is the 
assumption made in dealing with all frequency distributions. 
Therefore, it is important to select a class-interval most accurately 
representing the data. For example, the simple arithmetic average 
of the rates in Table VII, found by adding all the rates and di- 
viding by 91, is 1,118} that is the absolute arithmetic average. 
When the arithmetic average is computed from the data grouped 
by class-intervals of 250, it is found to be 1,158} by class-intervals 
of 500, it is 1,129} by class-intervals of 1,000, it is 1,094. The 
average closest to the absolute average is that computed from the 
data arranged in class-intervals of 500. If the number of counties 
were large, say, 1,000 or more, the average computed from 
grouped data should be approximately the same as the simple 
average found by adding all items and dividing by the number of 
items. But even when the number of items is large, the size of the 
class-interval is important. In the class-interval, 2,500-2,999, there 
are only two items. Both are less than 2,750, but as a matter of 
fact they are assumed to be 2,750 in using the grouped data. The 
effect is to raise their value, and, hence, it is not surprising that 
the average computed with a class-interval of 500 is slightly larger 
than the simple average. It might just as well be smaller than the 
simple average, as is the case when computed from data grouped 
by class-intervals of 1,000. 

This raises the question of artificial concentration at certain 
values in a frequency distribution. Table VIII makes this point 
clear. 

Notice the concentration of frequencies on grades divisible by 5. 
In grading papers an exact evaluation of the work is generally 
impossible. Since people, including teachers, generally think more 
easily in terms of numbers divisible by 5, grades tend to be given 
in this manner. If it were decided to put the above data into a 
frequency distribution with class-intervals greater than 1, the mid- 



130 SOCIAL STATISTICS 


TABLE VIII 

Five Hundred Marks in English Classified by Single Per Cents 1 


Grade 

Per Cent 

Frequency 

Grade 

Per Cent 

Frequency 

20 

20 

52 

10 

21 

0 

53 

3 

22 

1 

54 

3 

23 

1 

55 

20 

24 

0 

56 

0 

25 

20 

57 

1 

26 

0 

58 

4 

27 

0 

59 

0 

28 

0 

60 

25 

29 

1 

61 

3 

30 

38 

62 

13 

3 i 

0 

63 

8 

32 

3 

64 

2 

33 

3 

65 

15 

34 

3 

66 

0 

35 

47 

67 

2 

36 

1 

68 

6 

37 

0 

69 

0 

38 

9 

70 

19 

39 

2 

7 i 

1 

40 

53 

72 

2 

4 i 

0 

73 

0 

42 

4 

74 

0 

43 

2 

75 

10 

44 

2 

76 

0 

45 

55 

77 

1 

46 

0 

78 

1 

47 

5 

79 

0 

48 

18 

80 

7 

49 

0 

85 

3 

50 

51 

46 

4 

90 

3 


1 Data from Chaddock, op. cit. y p. 77. 


point of the class-interval should in all cases be a number divisible 
by 5. Table IX presents these data in class-intervals of 5. 

The arithmetic average of the grades arranged in intervals of 5 
with the numbers divisible by 5 falling at the mid-point is 47. 
If the class-intervals are left the same size but rearranged so that 
the numbers divisible by 5 fall at the top of each class-interval, the 
average is 45.5. That is not a great difference, but it illustrates the 
effect of the class-interval upon the average. The student will 
frequently find data which for some artificial reason tend to con- 
centrate at numbers divisible by 5,' 10, 25, 50, 100, 500, 1,000. 
Salaries are likely to be in terms of hundreds of dollars. If they 
are classified into a frequency distribution, the mid-point of the 



STATISTICAL ANALYSIS 131 

TABLE IX 


Five Hundred Marks in 
English Classified by In- 
tervals of 5 Per Cent 


Grade 

Per Cent 

Fre- 

quency 

18-22 

21 

23-27 

21 

28-32 

42 

33-37 

54 

38-42 

68 

43-47 

64 

48-52 

78 

53-57 

27 

58-62 

45 

63-67 

27 

68-72 

28 

73-77 

11 

78-82 

8 

83-87 

3 

88-93 

3 


class-interval should fall on an even 100 or 1,000. There is often 
seen some concentration of ages around numbers divisible by 5. 
Retail prices of articles fall more often on numbers divisible by 5 
than on any other. Likewise wages are likely to be on even dollars, 
half-dollars, or quarters, though piece wages are more evenly dis- 
tributed. Whenever there is any reason to suspect an artificial 
factor operating to bring about concentration around certain num- 
bers, these numbers should be ascertained before the class-interval 
is decided upon, and then, if these numbers recur regularly, they 
should be placed at the mid-point of the class-interval. 

Two other considerations enter into determining the size of the 
class-interval. General-purpose tables should have small class- 
intervals — intervals as small as anybody is likely to want. Special- 
purpose tables may have class-intervals of any size that gives 
satisfactory results. Such data as those published by the Bureau of 
the Census are for general use. The age distribution must be given 
in small class-intervals so that they may be used by persons who 
want a single-year distribution as well as by those who want 5- or 
1 0-year distributions. The larger class-intervals can be made up 
from the small ones, but the large ones could not be broken down 
into the small ones. For many purposes it is desirable to know the 
number of the population for each year of age, especially below 
five years of age. The census reports give these numbers, though 
they generally give the total for the 5-year period also. If the 



132 SOCIAL STATISTICS 

statistician has assembled a large mass of data for a special purpose 
but has an idea that he might use it for some other purpose, he 
must keep the data on file in class-intervals as small as he would 
ever want, though he may publish the results of a special study 
and use only large class-intervals. 

Occasionally a table does not have class-intervals of uniform 
size. Small class-intervals are used for the lower magnitudes, but 
large ones are introduced for presenting the frequencies in the 
higher magnitudes. This is sometimes done, because the frequencies 
in the larger magnitudes are small in number. For example, in 
Table VII only 1 1 counties have imprisonment rates of 2,000 or 
more. All of these might have been grouped in a class-interval of 
2,000-3,999. An average computed from such a grouping would 
likely vary considerably from the true average. With such a group- 
ing in Table VII the average would be 1,186. This is much larger 
than the true average. Thus, it will be seen that, if the grouped 
data are to be used for obtaining an arithmetic average, they 
should be presented in uniform class-intervals. If there are special 
reasons for using class-intervals of different sizes in the same table, 
then the larger ones should be multiples of the smallest class- 
interval used. For example, the smallest class-interval might be 5, 
as in Table IX, but in the upper ranges the class-interval might be 
increased to 10 or 15. But even this practice limits the uses to 
which someone else might want to put the data. In special-purpose 
tables there is more justification for class-intervals of different sizes, 
but there is hardly any justification for the practice in general- 
purpose tables. 

There is a device which may be used with approximate ac- 
curacy to redistribute class-frequencies, if they happen to be given 
in class-intervals unsuitable to the purpose of the worker. This is 
a cumulative frequency curve. 4 Suppose we have the census dis- 
tribution of population in a city by age-groups and for some 
special purpose we need a different distribution. How could we 
determine the number of children 11 to 13 years of age, if we 
have only the number for 10 to 14 years of age given in the table? 
A cumulative frequency curve with age as the horizontal scale and 
numbers of the population for the vertical scale can be made. Then 
the number indicated by the curve at 13 years is found and the 
number indicated at 11 years is found. If we subtract the second 

4 Sec p. 175ft. for detailed description of cumulative frequency curves. 



STATISTICAL ANALYSIS 


133 


number from the first, we have approximately the number of 
children 11-13 years of age. 5 

The limits of the class-interval should be determined and stated 
precisely. In Table VII the first class-interval is given as “under 
500.” That means that any rate falling short of 500 by however 
small an amount is placed in this class-interval, and the assumption 
is that in each of the other class-intervals rates falling short of the 
lower limit of the next class-interval by however small an amount 
belong in the class-interval below this limit. There is, then, no 
question as to what rates belong and are put in each class-interval. 
But suppose that the first class-interval were written “0-500” and 
the next one “500-1,000.” Where would a rate of 500 be put? 
Only the person who constructed the table could tell, and he 
might have forgotten just what he did. The class-intervals should 
be stated in numbers which are mutually exclusive. 

5. EXERCISES 

1. The forms which freshmen fill out at college, when they 
matriculate, are a good source of data for practice in construct- 
ing tables. These data are already gathered and require no field 
work on the part of the student. From them construct the 
following tables: 

(a) Age distribution of freshmen by sex. 

(b) Credits offered by freshmen to meet admission require- 
ments. Make a frequency table. 

(c) Occupations of the parents of freshmen. 

(d) Height and weight. 

2. Construct a schedule suitable to obtain the following informa- 
tion from students: age, sex, occupation of parents, occupational 
intentions of the student, height, weight, nationality, race. Each 
student should take a number of these schedules and get the 
necessary information from his friends. If no names are taken, 
no objections should be encountered. The information obtained 
by all the students can then be pooled so that each one will 
have sufficient data with which to work. Construct tables which 
exhibit the meaning of the data. 

3* Take 100 leaves from a tree, measure the length of each leaf, 
and present the measurements graphically as an array. 

4. The following data are miles which 415 male felons in Indian- 

* This device is illustrated by Whipple, G. C., Vital Statistics, pp. 75-77. 
New York: Wiley, 1923. 



134 


SOCIAL STATISTICS 


apolis in 1930 went from their homes to commit the offenses 
for which they were sentenced by the court. Make frequency 
tables from these data, using class-intervals of half a mile and 
one mile: 


.86 

•95 

3-70 

330 

1.30 

1 .00 

■54 

1 .86 

1 .00 

2.81 

2. 16 

1 .00 

2-54 

4.41 

1.76 

.76 

.89 

2. 1 1 

2.13 

2.13 

4.46 

2.89 

■38 

•97 

3 .19 

1-73 

.76 

4.08 

1.24 

.62 

2.22 

1. 51 

8-43 

•95 

•95 

4.21 

2.08 

2.02 

4-05 

1. 81 

1.30 

. 62 

4-30 

4.24 

1 .08 

5.62 

3-89 

4 • l 1 

.76 

8-43 

3-35 

3 03 

3-52 

2.27 

.86 

•54 

1 .00 

I .OO 

2.00 

3-76 

2.76 

1.05 

1.05 

373 

2.89 

3.16 

.76 

•92 


I .08 

■49 

2-37 

2.16 

2.00 

2 . II 

2.00 

2.89 

.87 

2-37 

• 76 

-54 

1.24 

3-2 5 

-49 

4-43 

I .08 

■95 

2.00 

2.22 

I . II 

■49 

2.16 

3-14 

2.68 

3.00 

2.62 

38 

1. 41 

7.89 

2.87 

03 

2.00 

•95 

1 .62 

95 

376 

1 .03 

1 .08 

92 

•39 

.81 

-97 

62 

1 .62 

4-43 

3 03 

•54 

•97 

1 .00 

1 .49 

2.30 

■38 

2.89 

6.59 

7 - 4 1 

2.81 

1 .84 

2.70 

1 -57 

•95 

.86 

1 .08 

1. 41 

1.92 

1. 14 

4.76 

1.49 

.27 

•54 

1 .97 

2.49 

2.76 

1. 19 

1 .08 

.62 

.76 

.68 

•97 

54 

•49 

3 ■ 5 1 

.76 

2.16 

1 .00 

1 .03 

1 .22 

4.22 

6.68 

1. 19 

.76 

1.03 

1 .68 

2.14 

1 .30 

•95 

8-35 

4.76 

5-03 

1 .16 

1.87 

4-57 

3 08 

1-43 

3-68 

■95 

5.16 

1. 41 

1 . 16 

3-30 

1 .49 

4-30 

4.08 

1. 19 

.46 

.86 

3H 

2.38 


1.70 

.70 

2.05 

1. 41 

•73 

.00 

1 . 11 

2.92 

1. 41 

•59 

1 .22 

1.65 

2.65 

•30 

3-92 

•97 

1 .00 

• 4 i 

4.14 

2.24 

2.08 

.76 

1. 51 

3-65 

2.05 

•57 

2-35 

301 

2.81 

301 

•97 

2.38 

3-43 

2.16 

1-54 

2.22 

1 .08 

1.27 

4.14 

2.00 



STATISTICAL ANALYSIS 


135 


1.49 

1 .16 

•51 

.78 

2.32 

2.97 

1 35 

2. 11 

1.76 

3.87 

1.03 

•73 

2.27 

•51 

.81 

1 .46 

1.86 

1 . 16 

• 5 i 

3-43 

.92 

•95 

1.05 

•95 

•95 

1 . 16 

•95 

1 -97 

i -97 

.84 

1.65 

2-54 

3-38 

.81 

2.03 

1.65 

3-95 

2 . II 

1.86 

2.08 

1 . 1 6 

4.87 

1 . 11 

1 . 11 

1. 19 

1 . 1 9 

3-95 

i -57 

.86 

2.27 

4.65 

•51 

.92 

I .OO 

.86 

1 . 16 

•97 

4-33 

3 03 

1 .00 

2.00 

.81 

.81 

2.21 

1. 14 

4.05 

4.65 

• 95 

2.32 

1.87 

• 5 i 

1 .03 

1 .92 

.86 

1 .00 

•54 

2.76 

.81 

1.97 

1.27 

•73 

6.16 

3-32 

1.97 

3.68 

3-35 

•73 

2.54 

1. 51 

.46 

1 .30 

4.00 

2.00 

1 -73 

3-46 

3 - 11 

1. 51 

2.51 

2.16 

3-24 

1 .08 

2.81 

1.19 

1.24 

2.0 3 

1.19 

•57 

2.22 

1 .62 

.46 

1 .32 

.86 

.76 

1.05 

•97 

3-83 

2.24 

•57 

324 

2.16 

1 .30 

1 .30 

•57 

•57 

1 .22 

1.19 

1.78 

1.78 

1.84 

•78 

1. 19 

• 43 

1 -35 

2.22 

2.03 

2.24 

1.30 

1 . 11 

.62 

2.22 

i -73 

2.89 

1. 14 

4-59 

2.03 

6.16 

2.46 

1.19 

2.79 

2.79 

2.06 

2.65 

1.24 

1 .00 

2.06 

1 .00 

2.81 

2-33 

■49 

•49 

1 .00 

5.68 

1 .08 

1.38 

.86 

2.65 

1. 19 

.38 

.46 

.92 

.86 

1 .68 

2.75 

•6j 

1 .76 

5-19 

•38 

2- 75 

1.78 

3-30 

1-43 

2 .03 

1.68 

1. 51 

4.60 

1-73 

•97 



6. REFERENCES 

Burgess, R. W., Introduction to the 'Mathematics of Statistics , 
Chap. IV. 

Chaddock, R. E., Principles and Methods of Statistics , Chap. V. 
Gavett, G. I., First Course in Statistical Method } Chap. II. 

Mills, F. C., Statistical Methods , Chap. III. 

Mudgett, B. D., Statistical Tables and Graphs , Part I, Chap. III. 
Secrist, Horace, An Introduction to Statistical Methods , Chap. VI. 
Yule, G. U., An Introduction to the Theory of Statistics } Chap. 
VI. 



CHAPTER VII 


Graphic Presentation 


I. INTRODUCTION 

Graphic presentation of social statistics is a way of making ab- 
stract relations and magnitudes visible by means of symbols. A 
graph appeals to the eye. It pictures relationships and magnitudes 
in various symbols having conventionally accepted meanings. 
Graphic methods are introduced fairly late in the study of a social 
problem involving statistics. Long before they are required, the 
problem has been defined and data have been collected, tabulated, 
and classified. Even after the data have been classified, some other 
statistical analysis may be undertaken before graphs are con- 
structed. But the analysis done at this point is more likely than not* 
to involve the use of graphic methods. Graphic methods are in 
many respects simple, but it will be seen in this and later chapters 
that line graphs may become rather complex in conception. Thus, 
it will be seen that graphic methods serve an analytical as well as 
a presentational purpose. This chapter is concerned with graphs of 

TABLE X 

The Number of New Protestant Denominations in Each 50- 
YEAR Period, 1500 to 1900, as Represented in the United 
States 1 


Period of 

Number of New Denomina- 

Origin 

tions in Each Period 


1500-1549 

1550-1599 

1600-1649 

1650-1699 

1700-1749 

1750-1799 

1800-1849 

1850-1899 f. 


4 

2 


IO 

43 

80 


1 Sec White, R. Clyde. Denominationalism in Certain Rural 
Communities in Texas , p. 12. Training Course for Social Work, 
Indiana University, Indianapolis, 1928. 

136 



STATISTICAL ANALYSIS 


137 

both kinds, though for exhaustive treatment the reader is referred 
to standard monographs on the subject of graphs. The presenta- 
tional and analytical functions of graphs cannot be entirely sepa- 
rated. Sometimes a graph which announces certain facts in an 
emphatic way also serves an analytical purpose, and vice versa. 
This double function of graphic methods is illustrated below: 


Y=NEW DENOMINATIONS 



Figure XVII. — New Protestant Denominations in Each 50-YEAR Period, 
1500 to 1900, as Represented in the United States 

This chart shows that the tendency of the Christian Church to split 
into denominations or sects has been greater in recent times than 



SOCIAL STATISTICS 


138 

in the period immediately following the Protestant Reformation. 
Any person glancing at the title of the chart, at the base line and 
left vertical line designations, and then at the curve would infer 
that new denominations arose much more rapidly in the second 
half of the nineteenth century than in any previous fifty-year 
period. The distance of points from the base line measures the 
rapidity of increase in denominations. As a method of analysis the 
chart shows change in number of denominations by definite pe- 
riods. Table X, of course, gives the same result. But psychologi- 
cally there is little doubt that the graph is more effective in 
convincing the reader of the strength of the drift to denominations. 
It presents the facts in their correct relations, and presents them 
effectively. It would do this without the table, but it is better to 
give the table also so that anyone who wishes may consult the 
exact figures. 

So much for what the chart shows. But the student beginning 
the study of statistics is interested in the mechanics of the graph. 
Periods of time are represented on the base line. One side of a 
square represents each period of fifty years, beginning at the left 
and going forward with time toward the right. In any graph in 
which time is one of the factors to be plotted, whether days, 
months, or years, it is customary to plot time along the base line. 
The other factor is plotted on the vertical line to the left, as indi- 
cated on this chart. The vertical scale starts with zero at the 
bottom and goes as high as the data require. Another thing to 
notice is that the points representing time are located in the middle 
of the squares in the horizontal direction. This is customary, be- 
cause a definite period of time has elapsed, and it is assumed that 
some denominations arose early in each fifty-year period, some 
about the middle, and some toward the end. Placing the point 
halfway between 1550 and 1600, or any other two terminal dates, 
gives each end of the period equal weight. 

In Chapter III it was pointed out that there are independent 
and dependent variables and that the statistician is primarily con- 
cerned with the relations existing between them. Time is always 
an independent variable. Whatever social phenomena appear, they 
must appear in time and in a definitely measurable period of time. 
Hence, the fifty-year periods of time in Figure XVII are the 
independent variable. The independent variable is by convention 
plotted on the horizontal line and is designated by X. In this 
problem “new denominations” are the dependent variable. They 



STATISTICAL ANALYSIS 


139 


occur in time, and they cannot occur without the passage of time. 
They have become more frequent as time has passed. The varia- 
tion in number of denominations is a function of time. No arbitrary 
values can be assigned to “new denominations”; they are de- 
pendent upon the operation of other factors which are not meas- 
ured here — only time in which the variations occur is measured. 
The frequency of the variations in “new denominations” is caused 
by other factors. The dependent variable is plotted on the vertical 
line and is designated Y. 

The type of graph represented by Figure XVII is known as a 
line graph, because the data are represented by points connected by 
straight lines, or they might be represented by a smooth line 
drawn to fit the distribution of points. The most common line 
graphs are straight line graphs, nonlinear graphs, ratio charts, 
histograms, and frequency polygons. All these types of curves will 
be found to fit various kinds of social data. 

Line graphs are particularly useful in showing functional rela- 
tionships; that is, the relations of two series of data, or variables, 
which are causally related. Other graphic forms are bar charts, pie 
charts, pictograms, and cartograms; these will be discussed briefly 
under the heading of “Miscellaneous Graphic Devices” in the 
latter part of the chapter. 

Before proceeding to the detailed consideration of line graphs, 
two principles of general usefulness should be described: rectangu- 
lar coordinates and logarithms. 

2. RECTANGULAR COORDINATES 

The principle of rectangular coordinates is involved in the con- 
struction of all line graphs. It sounds like a formidable mathe- 
matical concept, but in fact it is an elemental fact of common 
experience, though we do not ordinarily think of Cartesian coordi- 
nates when this experience comes along. Suppose a man plans to 
build a house on a rectangular lot. He wants to place the house 
accurately. The lot is ioo'xi25', and the house is to have a 40- 
foot front. He decides that the house should be 30 feet from the 
south side of the lot and that the west side of the house should 
be 30 feet from the west side of the lot. If he measures off 30 
feet directly east from the southwest corner of the lot and then 
turns north and measures off 30 feet, he will have located the 
point at which the southwest corner of the house will fall. In the 
following chart P indicates the southwest corner of the house: 



SOCIAL STATISTICS 


140 


100 FEET- 
NORTH 


CNJ 


N 



SOUTH 


M 


Figure XVIII. — Location of the Southwest Corner of the House at P 


Referring to the chart, the line MP is erected perpendicular to 
the south side at a point 30 feet from the corner, and the line 
NP is drawn perpendicular to the west side, which, of course, 
intersects the west side at a point 30 feet above the corner. The 
intersection of the lines MP and NP determines the location of 
* the southwest corner of the house ; these lines are the coordinates 
of the point P. Since these two lines intersect at right angles to 
each other, they are rectangular coordinates. Thus, such a common 
experience as locating the corner of the foundation of a house in- 
volves the principle of rectangular coordinates. 



STATISTICAL ANALYSIS 141 

But how is this principle related to a line graph? The complete 
system of rectangular coordinates would be represented by four 
adjoining house lots, as in Figure XIX: 


II 


X'- 


III 


IV 


Figure XIX. — Rectangular Coordinates 

The upper right section of the chart is known as the first quadrant, 
or the house lot represented in Figure XVIII, and the other 
quadrants, or house lots, are numbered II, III, and IV. But, drop- 
ping the analogy of the house lot, we have plotted the data from 
Table X in the first quadrant. Each point represents the inter- 
section of coordinates, and the line connecting the points makes 
the curve. 

Each of the coordinates of a point has a name. The horizontal 
coordinate is called the abscissa , and the vertical coordinate is 
called the ordinate . The base line is commonly designated X, and 
the vertical line on the left is designated Y. The point of inter- 



142 


SOCIAL STATISTICS 


section of these two perpendicular coordinates is designated O and 
is called the origin , or zero origin , meaning that both the X 
coordinate and the Y coordinate at this point have a value of zero. 
In plotting data for a curve the units on both the horizontal scale 
and the vertical scale are measured off from the origin. 

One other conventional practice in the use of coordinates should 
be noted, and that is the positive or negative sign of the coordi- 
nates in different quadrants. Both the abscissa and the ordinate are 
positive in the first quadrant. In the second quadrant the ordinate 
is positive, but the abscissa is negative. Both coordinates are nega- 
tive in the third quadrant. In the fourth quadrant the abscissa is 
positive, but the ordinate is negative. The general rule is that the 
abscissa is positive on the right of the origin and negative on the 
left of the origin. Correspondingly, the ordinate is positive above 
the origin and negative below the origin. In social statistics the first 
quadrant is used almost exclusively, though occasionally, as will 
be seen in Chapter XIII, the fourth quadrant will be used. It is 
conceivable that one might set up a statistical problem involving 
social data which would require the use of other quadrants. Graphs 
like Figure XVIII will be the more common, however, and only 
the first quadrant will appear in the presentation. 

Two other definitions are necessary. Referring to Figure XIX 
the line XX' is known as the x-axis, and the line YY' is known as 
the y-axis. Instead of referring to the base line or the vertical line 
on the left, it will be convenient to speak of the x-axis and the 
y-axis. 


3. LOGARITHMS 

Logarithms have a variety of uses in statistical work, particu- 
larly in graphic presentation. They are used most frequently in 
calculating geometric averages, certain index numbers, and in 
logarithmic and semi-logarithmic curves. A brief account of the 
theory and use of logarithms is necessary at this point. 

A logarithm is the power of a number, known as the base, to 
which the number must be raised to equal a second number. For 
example, 2 is the power to which 10, the base number, must be 
raised to equal 100, and 2 is the logarithm of 100. The power, 2, 
is the exponent of 10, that is, 10 is to be squared, and as the 
logarithm of 100 it represents the root of 100 which must be 
found in order to determine the base number. If b is the base, 



STATISTICAL ANALYSIS 


H3 


x the power to which the base is to be raised, and N the number, 
the exponential form is 

b* =N 
or io 2 = ioo 

The logarithmic form is 

x = log 6 N 
or 2 = 2.000000 

To use logarithms in multiplying two numbers the logarithms 
are added, and the sum of the logarithms of the numbers is equal 
to the logarithm of the product of the numbers. Then the number 
which is the product of the numbers may be found in a table of 
logarithms. Similarly, the logarithm of the quotient of two num- 
bers is equal to the difference of the logarithms of the numbers, 
and the quotient of the numbers is found in a table of logarithms. 
The square root, the cube root, or any other root of a number may 
be found by dividing the logarithm of the number by the index 
(e.g., 2 for the square root) of the root required. The quotient of 
this operation is the logarithm of the number which is the root 
required. This is important to remember, because many students 
will have forgotten how to extract the square root of a number 
and most of them will not know how to extract higher roots, 
whereas the use of a table of logarithms for this purpose is simple. 
Many later problems in this text will require square roots. 

The common system of logarithms is calculated on the base, io. 
However, a system of logarithms might be calculated upon any 
number as the base. The decimal system is more convenient, and 
the published tables of logarithms all use this base. Appendix C 
contains logarithms for numbers from i to n,000 true to five 
decimal places. If the logarithm of ioo is 2, then the logarithm 
of 1,000 is 3, since io raised to the third power is i,000. What 
would be the logarithm of a number lying between ioo and 
i,000? For example, 756. Consulting the table of logarithms, we 
find in the first column to the right of the number, 756, the num- 
ber, 878522. The logarithm of 7 56 obviously will be between 2 
and 3. This large figure found in the table should have a decimal 
point in front of it and, to the left of the decimal point, the num- 
ber, 2. Hence, the logarithm of 756 is 2.87852. 

There are two parts to every logarithm. That part to the left 
of the decimal point never appears in the table, because it is de- 



1 44 


SOCIAL STATISTICS 


termined from the number of digits in the number. This part of 
the logarithm is known as the characteristic and is always i less 
than the number of digits in the number — i.e., the number of digits 
which lie to the left of the decimal point, if there is one in the 
number. Therefore, the characteristic of the logarithm of 756 is 2. 
That part of the logarithm which is to the right of the decimal 
point is called the mantissa . This is the part of every logarithm 
which is found in the table. The mantissa of a number is always 
positive. The characteristic of a number greater than 1 is positive, 
but the characteristic of a number less than 1 is negative. Suppose 
it is desired to know the logarithm of .00289, a number which is 
less than 1. Look up the mantissa of 289 in the table — the mantissa 
of 289 is the same, whether the number be 289, 28.9, 2.89, or 
.00289. The mantissa is found to be .46090. The characteristic of a 
number less than 1 is negative and is 1 greater than the number 
of zeros between the decimal point and the first significant figure. 
Write the logarithm of .00289 thus: 3.46090, or 7.46090-10. 

After examining the table of logarithms it will be noticed that 
there are 10 columns of figures and that at the top of each is a 
number in heavy-face type. These numbers run from o to 9. If 
the logarithm of 289 is required, it is found to be 2.46090. But 
suppose the number is 289.7. What is the logarithm? In the 
column with 7 at the top and opposite 289 is the number 46195. 
Supplying the characteristic, we have 2.36195 which is the 
logarithm of 289.7. If the number were 289.74, a slightly differ- 
ent problem is presented, because the exact logarithm for this 
number is not given but must be found by interpolation. We sub- 
tract the mantissa of the logarithm for 289.7 from the mantissa of 
the logarithm for 289.8 which gives a remainder of ,00015. The 
significant figures in this quantity are 1 5. One of the little tables 
on the margin of the page has 15 in heavy-face type at the top. 
We run down the column of heavy-face type figures at the left 
until we get to 4 which is the digit at the extreme right in 289.74. 
Opposite this number in the table and in the next column is 6.0, 
or it is really .00006. If this number is added to 2.46195, the 
sum is 2.46201 which is the logarithm of 289.74. 

To find the number which corresponds to a logarithm the above 
procedure is reversed. It should be remembered that only the 
mantissa can 6e found in the table. Suppose the logarithm is 
2.46201. What is the number to which it corresponds? Turning 
to the table of logarithms, the first column of light-face type is 



STATISTICAL ANALYSIS 145 

followed down until 46 is found. Then the remainder of the 
mantissa will be found in another column and possibly in a differ- 
ent row of mantissas. The nearest mantissa to .46201 is .46195. 
That is the mantissa of 289.7. It is not the logarithm which we 
have. Subtracting the mantissa, .46195, from the mantissa, 
.46201, in the next column we get 15. The difference between 
our mantissa and .46195 is .00006. Consulting the table of pro- 
portional parts which has 15 at the top, we follow down the 
column of light-face type until we find 6 or the number nearest 
to it. Opposite this number in the column to the left is 4. That 
is the last digit of the number sought, which is 289.74. 

4. THE STRAIGHT LINE GRAPH 

For some data in social statistics the graph is a straight line. 
This is due to the fact that the quantities change by equal incre- 
ments or decrements in a specified period of time. Figure XX will 
illustrate this principle. The data used in this graph are drawn 
from the field of crime and are given in Table XI. If a man is 
sentenced to federal prison for 10 years, he may reduce his time 
at the rate of 10 days per month for good conduct. 1 That is, when 
he has served a year of 365 days, he may get credit for having 
served 486 days. The table and graph follow: 

TABLE XI 

The Annual Accumulation of the Percentage of a io-Year 
Sentence Served Because of Good Conduct in a Federal 
Prison 


Percentage of Sentence 
Served, End of Each Year 


First 13 28 

Second 26.56 

Third 39 . 84 

Fourth 53 

Fifth 66.40 

Sixth 79-68 

Seventh 92.96 

Eighth (.53 yr.) 100.00 


If a prisoner received no deductions from his sentence for good 
behavior, his sentence would be represented by the broken line ( 1), 
but, if he has a perfect record and received maximum deductions 
for good behavior, his time served would be represented by the 

1 The Code of Laws of the United States of America, p. 514, sec. 710. In 
force December 6, 1926, 



SOCIAL STATISTICS 


146 

solid line (2). For perfect conduct the percentage of his sentence 
served in a year of 365 days is not 10.0 per cent, but 10.0 per cent 
plus 3.28 per cent. At the end of each year 13.28 per cent of his 
total sentence would be deducted from what remained so that he 
would be released from prison soon after the middle of the eighth 
year instead of at the end of the tenth year. 



Figure XX. — Showing the Cumulative Percentage of Time Served on 
a io-Year Sentence in a Federal Prison (i) Without Deductions for 
Good Behavior and (2) With Regular Monthly Deductions for Good 
Behavior 

Sometimes it is desirable to express a straight line in terms of 
an equation. This is particularly important if the slope of one 
straight line is to be compared precisely with the slope of another 
straight line. It is easy to see that lines (1) and (2) do not have 
the same slope; line (2) is steeper than line (1). But how much 
steeper is it? How much more xapidly does it rise toward the 
1 00.0 line? The equations expressing the slopes of the two lines 
placed beside each other show the difference in steepness immedi- 
ately and precisely. The slope of a line is determined by the ratio 




STATISTICAL ANALYSIS 


147 


of the height of the ordinate to the length of the abscissa, and the 
formula is 

m = y/x 

What do these symbols mean? It is very simple. Any specified 
distance on OY (referring back to Figure XX) is designated as y. 
Any specified distance on OX is designated as x. In the figure y 
is the same as ON or MP, and x is the same as OM or NP. Now 
let us measure the length of these distances, y and x. The side of 
each small square will be assumed to be divided into 10 equal 
parts. They are found to be as follows: 

y = 39.84%, or 39.84 small parts 
x = 1,095 days, or 30 small parts 

But it is the slope of the line in which we are interested. In order 
to find this, it is only necessary to substitute in the formula the 
values of x and y in terms of distance: 

Hence, m = 39.84/30.00 

771 = 1.328, slope or tangent of angle MOP 

The values of x and y must be expressed in terms of distance on 
the graph, and the slope, m } is found by dividing the value of y 
by the value of x. 

Looking at the broken line and thinking of y as MPi and x as 
OM, we can compute the slope of the broken line in the same 
manner, as follows: 

771 — y/x 
m = 30.00/30.00 

m — 1.000, slope or tangent of angle MOPi 

Comparing the slopes of the two lines now, it is seen that the solid 
line is steeper by .328 than the broken line. The comparative 
steepness of two straight lines on different charts can be shown 
exactly by the formula above. An important fact about the slope 
of straight lines is that, if y is smaller than x , then m is less than 
1. If they are the same size, then m is 1. If y is greater than x 7 as 
in this example, m is greater than 1. 

Some straight lines cut the zero ordinate, OY, above the origin, 
O. How can the slope of a line which does that be computed? 
First, let us consider a problem in which this occurs. A sum of 
money is placed at simple interest, and the interest is allowed to 
accumulate. Table XII gives the accumulation of $1,000 at 6 



SOCIAL STATISTICS 


148 

per cent interest at the end of a 10-year period, and Figure XXI 
presents the data graphically (p. 149). 

TABLE XII 

The Accumulation of $1,000 at 6 Per Cent Simple Interest at 
the End of Each Year of a io-Year Period 


Year 


Principal 
Plus Interest 


First. . . 
Second . 
Third.. 
Fourth . 
Fifth... 
Sixth... 
Seventh 
Eighth . 
Ninth. . 
Tenth. . 


$ i ,060 
1,120 
i ,180 
1,240 
1,300 
1,360 
1,420 
1,480 
1,540 
1,600 


The line cuts the zero ordinate at M. Draw MN and NP, which 
in this problem are x and y respectively. The formula is the same 
as before: 


m = y/x 
x = 5 
y = 1.2 
m = 1.2/5 

m = .24, the slope of MP 

While this formula gives the slope of the line, it does not 
describe completely the line in its relations to the system of co- 
ordinates which is utilized in the construction of the graph. The 
general equation of a straight line, expressed in these terms, is 

y = mx + b. 

In this formula y equals the distance of any point on the line from 
the base line, or zero abscissa; m is the slope of the line; x is the 
length of the abscissa; and b is the distance between O and the 
point where the line cuts OY. The curve for the accumulation of 
a sum of money at simple interest is represented by an equation 
of the type of this formula. It is sometimes convenient to refer 
to a graph as of such and such a type, giving the equation instead 
of the graph. , 

The straight line graphs have many uses, most of which will 
be described later, because they enter into more complicated statis- 
tical methods. Regression lines in simple linear correlation are 



STATISTICAL ANALYSIS 


149 


straight lines and will be described in Chapter XI. Trends may be 
indicated by straight lines, and they will be described in Chap- 
ter XIII. 

Y= DOLLARS 

2000 


1500 


1000 


M 


500 


'0 

012345678 


X= YEARS 
9 10 


Figure XXI. — The Accumulation of $1,000 at 6 Per Cent Interest 
at the End of Each Year of a io-Year Period 


SEMI-LOGARITHMIC CHARTS 

In the graphs which have preceded, the actual numbers have 
been plotted. But sometimes it is desirable to plot the logarithms 



SOCIAL STATISTICS 


150 

of the numbers instead of the actual numbers, at least on the 
vertical scale. A semi-logarithmic chart shows at a glance the rate 
of change, whereas the chart constructed from the actual numbers 
does not make this obvious. The semi-logarithmic chart has the 
natural numbers plotted on the horizontal scale and the logarithms 
of the second series plotted on the vertical scale. Ordinary graph 
paper may be used, in which case the worker looks up the loga- 
rithms for the series of numbers plotted on the vertical scale and 
shows only the logarithms for the series on the y-axis of the chart. 
Ratio, or semi-logarithmic, paper may be purchased which is ruled 
on the y-axis according to the logarithmic scale and obviates the 
necessity of looking up the logarithms. 

The difference in appearance of data represented on the natural 
scale and on the logarithmic scale will be illustrated. 

TABLE XIII 

Population of the United States at Each Census, 1790 to 
1930 


Year Population 


1790 3,929,214 

1800 5,308,483 

1810 7,239,881 

1820 9,638,453 

1830 12,866,020 

1840 17,069,453 

1850 23,191,876 

i860 31,443,321 

1870 38,558,371 

1880 50 , 155,783 

1890 62,947,714 

1900 75,994,575 

1910 91,972,266 

1920 105,710,620 

1930 122,775,046 


The data will be presented in two charts: the first uses the natural 
scale y the second uses the logarithmic scale along the y-axis. 

Figure XXII shows the growth of population of the United 
States from 1790 to 1930 plotted on the natural scale. It indicates 
at a glance that the total population was small for the first five 
decades, but that after this point the aggregate number added 
every ten years increased markedly, and the largest single increase 
occurred from 1920 to 1930. The absolute increase has been 
higher for each succeeding decade except for the decades including 



STATISTICAL ANALYSIS 


the Civil and World wars. But Figure XXII tells nothing about 
the rate of increase in each decade. Does the population of the 
United States show a correspondingly increased rate of growth? 



Figure XXII. — Population of the United States, 1790-1930 
(Natural Scale) 

Figure XXIII, drawn to the logarithmic scale on the y-axis, an- 
swers this question. The rate of increase was larger in the earlier, 
instead of the later decades. The upper end of the curve shows 
a tendency to flatten out and to point to the time of an approxi- 




POPULATION IN MILLIONS 


152 


SOCIAL STATISTICS 


mately stationary population in the not distant future. On the 
ratio chart the distance of the curve at any point from the base line 



X=YEARS 

Figure XXIII. — Population of th£ United States, 1790-1930 (Semi- 
Logarithmic, or Ratio, Scale) 


is not significant. Only the slope of the curve in Figure XXIII is 
significant j it indicates the rate of change. Frequently the rate of 
change in a series of social data is of primary importance, and the 




STATISTICAL ANALYSIS 


153 


aggregate increase may assume a secondary interest. In such cases 
the ratio chart should be used for presenting the facts. 



Figure XXIV. — Population of the United States, 1790-1930 (Logarithms 
of Population Plotted on the Vertical Scale) 


Figure XXIV shows the same data on ordinary graph paper, but 
in this chart the logarithms of the numbers were found in a table 
and plotted, instead of the natural numbers. 

The form of this curve is similar to that in Figure XXIII. How- 
ever, the work can be done more rapidly, if ratio paper is used, 



154 


SOCIAL STATISTICS 


because that obviates the necessity of looking up the logarithms of 
the numbers. 

In public welfare work administrators and the public are often 
interested in the rate of change from year to year. Figure XXV 
presents weighted index numbers of public welfare work in Indiana 
from 1900 to 1927 (p. 155). 2 

TABLE XIV 


Weighted Indexes of Public Welfare Work in Indiana, 

I9OO TO I927 


Year 

Index 

Year 

Index 

1900 

88.9 

1914 

101 .0 

1901 

88.1 

1915 

109.7 

1902 

88.7 

1916 

109.3 

1903 

89.2 

1917 

110.0 

1904 

92.9 

1918 

99-9 

1905 

93-8 

1919 

97.8 

1906 

94 9 

1920 

94-7 

1907 

92.6 

1921 

98.9 

1908 

95-5 

1922 

102.7 

1909 

92-3 

1923 

101 .6 

1910 

95-3 

1924 

106.7 

1911 

97 -o 

1925 

n 5 3 

1912 

99.2 

1926 

118.0 

1913 

100.0 

1927 

121 .9 


These index numbers vary from 88.1 in 1901 to 12 1.9 in 1927 
which represents a large increase in the volume of work done (al- 
lowance was made in the index for increasing population and for 
changes in the purchasing power of the dollar), but the percentage 
change from one year to another is small. The volume of work is 
steadily growing, but the rate of increase is not large. The average 
increase was found to be a little less than 1 per cent a year, when 
the straight line trend was computed. The answer to the question 
of whether the ordinary chart or the ratio chart should be used 
depends upon the purpose of the worker. That should be clear 
before the form of presentation is decided. 

6. CUMULATIVE CHARTS 

In social planning it is necessary to estimate the probable volume 
of work and the necessary budget for 12 months or more in ad- 
vance. In the case of budgets made bn a biennial basis, such as 
those requiring appropriations from state legislatures or from 

‘These data are taken from “Indexes of Public Welfare in Indiana,” by 
R. Clyde White. Social Forces, Vol. VIII, No. 2, p. 251. 



STATISTICAL ANALYSIS 


155 


Congress, it is necessary to plan two years in advance. If the need 
for the service has a general trend upward, then allowance must 
be made for a probably larger outlay in the second year than in 
the first year of the biennium. City departments also have to esti- 
mate their needs in advance. Once funds are available, the social 


01 

02 

03 

04 

05 

06 

07 

08 

09 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
1927 


Figure XXV. — Weighted Index of Public Welfare Work in Indiana, 
1900-1927 (Semi-Logarithmic Scale) 

agency or public department has to allot them on a monthly basis 
so that they will be distributed according to expected requirements. 
One of the statistical devices for keeping a close check on actual 
expenditures in relation to budgetary estimates is the cumulative 
chart. 

An example drawn from the field of family case work will 
illustrate the value of the cumulative chart for such purposes. The 
expenditures of the Indianapolis Family Welfare Society were 



SOCIAL STATISTICS 


156 

obtained by months for a period of four years. 8 The average 
amount for each month was found for the four-year period, and 
then the percentage distribution by months was obtained. These 
percentages were cumulated by months and are represented in 
Figure XXVI by the solid line. The broken line shows the cumu- 
lated percentages through August, 1928, on the basis of an esti- 
mated relief budget of $50,000 for the year. 

TABLE XV 

Cumulated Percentages of Actual Expenditures by Months 
for 1928 and Cumulated Percentages of Budget Estimates 
for the Entire Year 


Month 

Percentages Cumu- 
lated, 1928 

Percentages Cumu- 
lated, Estimates 

All 

86.3 

100 

January 

1 3-1 

12. 1 

February 

27.3 

24.1 

March 

42.0 

35-4 

April 

52.2 

44-1 

May 

61.4 

51-5 

June 


57-9 

July 

778 

643 

August 

86.3 

70.6 

September 


76.4 

October 


82.6 

November 


89.9 

December 


100.0 


This chart shows that the actual expenditures through August, 
1928, were running steadily ahead of the budgetary estimate and 
that the funds would be exhausted before the end of the year. 
Such a situation is not uncommon in the history of relief agencies, 
because economic conditions cannot be predicted a year in advance. 
All the agency can do is to make as careful an estimate as possible 
and then make readjustments as new conditions are discovered. In 
the above case, either expenditures must be sharply reduced or 
additional funds obtained. At the end of any month in the year 
the relief agency could quickly see from the chart the financial 
problem it is facing. For presenting such data to boards of direc- 
tors, the cumulative chart is very effective, and it is a useful guide 
to the executive who is trying to cbntrol expenditures by a monthly 
quota system. It will be obvious that the same kind of chart can 
be used to advantage by a manufacturer who plans his production 

8 Data from monthly reports of relief published currently by the Russell Sage 
Foundation, Department of Statistics. 



STATISTICAL ANALYSIS 


157 


for the year on a monthly basis, as a means of closely following 
the seasonal variations in the demand for his product and of 

Y=CUMULATED PERCENTAGES 
100 

90 

80 

70 

60 

50 


40 


30 

20 

10 

10 X=MONTH9 

Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 

— BUDGET ESTIMATE 1928 

— —EXPENDITURES 1928 

Figure XXVI. — Comparison of Budgetary Estimate and Actual Ex- 
penditures in 1928 Through August, Indianapolis Family Welfare 
Society, in Terms of Cumulated Percentages 

stabilizing production and employment. The cumulative chart acts 
as a sort of budgetary, or production, barometer. 




I5» 


SOCIAL STATISTICS 


This chart has still other forms and uses. Figure XXVII is con- 
structed according to both the “more than” and the “less than” 
methods. These two methods can be explained best by reference to 
Table XVI: 


TABLE XVI 

Felons Sentenced in the Marion County Criminal Court, 
1930, According to the Percentage above (More Than) or 
below (Less Than) a Specified Age. 651 Felons 


Per Cent of Felons Per Cent of Felons 


Age More Than Less Than 

Specified Age Specified Age 


16- 100.0 0.0 

20- 70.2 29.8 

* 5 " 42.5 57-5 

30- 3 1 . 1 68.9 

35- 177 823 

40- 10.2 89.8 

45 ~ 5-6 94.4 

5 °- a. 4 97-6 

55- 1.8 98.2 

60- 1 . o 99.0 

65- -S 99-5 

70- .2 99.8 

75“ .0 100.0 


One hundred per cent of all felons are “more than” 16 years of 
age, and none are “less than” 16 years of age. That is, all have 
passed the sixteenth birthday. Forty-two and five-tenths per cent 
have passed the twenty-fifth birthday, and 57.5 per cent have not 
reached the twenty-fifth birthday. If these two columns of per- 
centages are plotted, they appear as in Figure XXVII (p. 159). 
To read the “more than” curve, look at any age on the horizontal 
scale, note the point on the ordinate erected from this point at 
which the “more than” curve cuts it, and read the percentage on 
the vertical scale opposite this point. This percentage is the per- 
centage of felons at or above this age. The “less than” curve is 
read in a similar manner except that the percentage read is the 
percentage of felons who are less than this age. 

In looking at this chart, it should be noticed that from 16 to 20 
is a period of only four years, whereas from 20 to 25 is a five-year 
period. Hence, allowance is made for this fact in marking off the 
horizontal scale, and the distance from 1 6 to 20 is only four-fifths 
of the distance between the other figures. 



STATISTICAL ANALYSIS 


*59 


Y=CUMULATED PERCENTAGES 
100 


80 

70 


50 


30 


20 


10 1 


X=AGES 

~16 20 25 30 35 40 45 50 55 60 65 70 75 

•MORE THAN SPECIFIED AGE 
— LESS THAN SPECIFIED AGE 

Ficure XXVII. — Cumulative Curves Showing the Age Distribution of 
Felons in Indianapolis in 1930 on a “More Than” and on a “Less 
Than” Basis — 651 Felons 



i6o 


SOCIAL STATISTICS 


7. THE HISTOGRAM AND THE FREQUENCY POLYGON 

The frequency distribution was discussed in Chapter VI, but it 
was presented there only as it appears in tables. Frequency dis- 
tributions may be presented in graphic form also. Indeed, their 
graphic presentation is as common in statistical studies as is the 
tabular form. It was seen in Chapter VI that the array makes it 
possible for the statistician and the reader to gain a better idea of 
the meaning of a body of data than can be gained from the ex- 
amination of an unorganized group of items. The array indicates 
the range of values from the lowest to the highest item. After 
discussing the array, it was shown how a still clearer idea could be 
obtained by grouping the data in class-intervals. This step prepared 
the material for presentation in the form of a frequency distribu- 
tion. The histogram and the frequency polygon are two additional 
ways of reducing data to intelligible form. Mass data cannot be 
understood without resort to various devices for bringing out their 
meaning. The histogram will be considered first. 

The first problem chosen to illustrate the use of the histogram 
is that of determining the age distribution of male employees in 
six moderately large firms: one department store, one street rail- 
way, and four factories. 4 Table XVII gives the data for the six 
firms in 10-year class-intervals: 

TABLE XVII 

Age Distribution of Male Employees in 6 Indianapolis 
Firms 


Age 


Number of 
Employees 


All Ages 


5,319 


15-24 

25-34 

35-44 

45-54 

55-64 


1,307 

i,757 

1,245 

688 

322 


The age group with the greatest frequency is 25-34 years. This 
fact is obvious from the table, but it is made more emphatic by 
the chart on next page. 

Each column in the histogram represents the number of em- 
ployees at each age period. The concentration in the period, 25 to 

4 Data from an unpublished study by the author. 



STATISTICAL ANALYSIS 


161 

34 years, is marked, and the small number in the period, 55 to 64 
years, is no less marked by the small height of the column. 

The mechanics of the chart require some explanation. The col- 
umns have the same width, because each represents in the hori- 


Y=WORKERS 



zontal direction the same period of time. It is customary to plot 
the frequencies on the ordinates, that is, the vertical direction, and 
to chart the other variable, years in this case, on the abscissas, that 
is the horizontal direction. The table shows that in the first age 
group there were 1,307 men. At the termination of the distance on 




SOCIAL STATISTICS 


162 

the abscissa which is required for the first age period a vertical 
line is drawn upward until the end is opposite a point on the verti- 
cal scale equaling 1,307. From that point a line is drawn until it 
meets the zero ordinate at 1,307. It will be remembered from the 
discussion of class-intervals that the assumption is made that the 
items in the class-interval are distributed evenly from the lowest 
to the highest value. That assumption is made graphic here by the 
short horizontal line at the top of the column. It is parallel to the 
base line. If it were assumed that there are more items concentrated 
at the upper end of the class-interval than at the lower end, the 
line would slope upward toward the right. Later it will be shown 
that this is a fact in some distributions, but for such a rough 
presentation as the histogram provides it is unnecessary to give 
attention to this fact. The second column presents the number of 
men between 25 and 34 years of age. The right-hand side of the 
first column forms the left-hand side of the second column for a 
part of the distance, but it is prolonged to a point equal to 1,757 
on the zero ordinate. At the upper terminus of this age group 
another vertical line is drawn equal in height to the first one. Then 
they are joined by a horizontal line. The other columns are formed 
in similar fashion. Thus, we have a comparison of the numbers of 
men in each age group, and we note the concentration in the 
second age group. 

But does such a large class-interval make the meaning of the 

TABLE XVIII 

Age Distribution of Male Employees in 6 Indianapolis Firms and of the Total 
Male Population of Indianapolis for the Same Age Periods (Census of 1920) 


Age 

Male Employees 

Males in 

Population 

Group 

Number 

Per Cent 

Number 

Per Cent 

Total 

5-386 

100.0 

114,111 

100.0 

15-19 

263 ■ 

4-9 

11,516 

10. 1 

20-24 

1.044 

19.4 

14,936 

13-1 

25-29 

1,002 

18.6 

15,675 

13*7 

30-34 

755 

14.0 

14,361 

12.6 

35-39 

715 - 

13.3 

14,000 

12.3 

40-44 

530 

9.8 

10,385 

9*1 

45-49 

37 i 

6.9 

10,426 

9 i 

50-54 

3 i 7 

5*9 

8,824 

7*7 

55-59 

202 

3*8 

6,051 

5-3 

60-64 

120 

2.2 

4.752 

4*2 

6c-6q 

67 

1.2 

2.18c 

2.8 



StATISTICAL ANALYSIS 163 

data as clear as a smaller class-interval? In order to compare the 
visual effects of a chart which uses a smaller class-interval, the 
data have been divided into class-intervals of five years each. They 



Figure XXIX. — Age Distribution of Workers, 5-Year Class-Intervals 

are presented in Table XVIII and are shown in Figure XXIX. 
The mechanics of construction are, of course, the same as in Figure 
XXVIII, but the impression one gets from the second chart is 



164 


SOCIAL STATISTICS 


that the number of employees in age groups above 20 to 24 years 
taper off more slowly than would be suspected from the first chart. 
In other words, while the first chart is technically correct and 
presents nothing but the facts, the large class-interval obscures the 
gradual decline in numbers in the higher age groups. This fact 
raises the question whether or not the male population in Indianap- 
olis is distributed in a manner similar to that of the employees in 
Figure XXIX. The decline in numbers of employees is very grad- 
ual — almost in a straight line. Would a histogram of the males in 
Indianapolis between 15 and 64 years of age have a similar form? 
This raises a question which does not permit one to say that the 
employers are discriminating against the older man but suggests 
a further inquiry to clarify this point. Figure XXX presents two 
histograms together (data from Table XVIII): the solid line is a 
reproduction of Figure XXIX, and the broken line presents the 
age distribution of the male population in the entire city. The 
vertical scale of this chart is in terms of percentage instead of 
the actual numbers, because the population class frequencies are 
so much greater than those of the employees that an accurate 
comparison could not otherwise be made between the two 
series. 

Some significant differences appear between the two histograms of 
Figure XXX. The number of men in the upper age groups does 
gradually decrease, but the largest percentage in any age group 
does not reach the largest percentage in certain age groups of the 
employees. After the forty-fifth year the percentage of men in the 
population in each age group exceeds the percentage employed by 
the six firms. It is clear, then, that younger men predominate in 
these firms and that either older men are not accepted as readily 
as younger men or they do not attract the older men. The last 
possibility suggests that still further inquiry is necessary, before it 
is possible to decide whether these firms discriminate against older 
men. As a matter of fact, the older men are found in relatively 
smaller numbers. Would they be employed, if they applied for 
jobs? The data presented here are inadequate to answer that 
question. The histograms merely present the conditions as they 
exist, which is their purpose. 

The preceding histograms have shown that the high percentages 
of employees in the six firms come at the earlier working ages. 
Some social data are distributed in different ways. It is common to 
find social series which have the concentration at the middle and 



STATISTICAL ANALYSIS 


165 


Y= PERCENTAGE 



' - PM PI DYFFS 

— — POPULATION 

Figure XXX. — Comparison of the Age Distribution of Employees in 
Six Firms and of the Total Male Population of Indianapolis Between 
ij and 64 Years of Age in Terms of Percentage 



SOCIAL STATISTICS 


1 66 

also at the upper end of the scale. Death rates by age groups show 
concentration at both ends of the age scale. An illustration will be 
given of data which concentrate near the center of the scale. Table 
XIX gives the distribution by ages of children in the eighth grade 
in the St. Louis Public Schools. 

TABLE XIX 

Distribution of Children in the Eighth Grade, St. 

Louis Public Schools, by Ages 1 


Age 


Number of 
Children 


All Ages 


4,72i 


10 i 

11 25 

12 348 

13 1,330 

14 1,684 

15 97i 

16 308 

17 50 

18 4 


1 Data from Woodrow, Herbert, Brightness and Dullness 
in Children , Lippi ncott, Philadelphia, 1919, p. 130. 

In Figure XXXI the data are presented in the form of a histogram 
to show the concentration at the middle of the age scale (p. 167). 
This histogram is almost symmetrical: it rises steeply from the 
lowest age group to the middle age group, and then declines 
rapidly to the highest age group. Fourteen is the most common 
age of children in the eighth grade. Comparatively, those in the 
lower age groups are advanced, and those in the higher age groups 
are retarded. It should be noticed that considerably more children 
are advanced by One year than are retarded. In view of the fact 
that the children who are two, three, and four years advanced or 
retarded are about equal in each age period, it may be that there is 
some artificial factor, such as an administrative practice, operating 
to make the unevenness in numbers of those who are advanced or 
retarded only one year. The fact that the histogram is symmetrical 
but for the differences at these two ages suggests that the numbers 
advanced or retarded only one year would be more nearly equal, 
if their status depended upon their ability alone, or possibly even 
if we had an indefinitely large number of children for this grade 
to study. 



STATISTICAL ANALYSIS 


167 

In connection with histograms and frequency curves the consid- 
eration of discrete and continuous variables is apropos. It will be 

y=number of children 

2000 


1800 


1600 

1400 

1200 

1000 

800 

600 


400 


200 


X=AGES 


10 11 12 13 14 15 16 17 18 19 


Figure XXXI. — Distribution of Children in the Eighth Grade, St. Louis 
Public Schools, by Ages 


recalled from Chapter III that a discrete variable was defined as 
one whose values differed by an assigned amount, whereas the 



1 68 


SOCIAL STATISTICS 


values of a continuous variable differed by infinitely small amounts. 
Logically the discrete variable ought to be presented graphically 
only by a histogram, because a histogram is a form of column 
chart and does not suggest continuity in the series of data. The 
values of a discrete variable show gaps, sometimes small, in the 
series arranged as a frequency distribution. In practice these often 
approach the form of an ideal frequency curve and are so pre- 
sented. If this is done with complete understanding that the varia- 
ble is really discrete and if no attempt is made to draw from it 
inferences which could apply only to a continuous series, the prac- 
tice is not objectionable. The continuous series may properly be 
presented as a smooth curve, because, even if the data do show 
small gaps, the variable changes by amounts infinitely small and, if 
a sufficient number of items were included, would take the form 
of a smooth curve. The histogram may be used to present con- 
tinuous variables, as in Figure XXXI above, where the inde- 
pendent variable, age, may vary by any amount however small, 
but it may err on the side of suggesting that the variable is discrete 
when it is not. This practice is also permissible, if it is clear to the 
worker that his data really should be presented as a frequency 
curve and if no misunderstanding would result. 

The data in Table XIX are a continuous series and may be used 
to illustrate the development of a smooth frequency curve from 
a histogram. As pointed out, this distribution is approximately 
symmetrical in form and approaches the form of distribution rep- 
resented by a “normal” or a bell-shaped curve. In scientific work 
this type of curve has many uses, and when we come to discuss the 
theory of probability in Chapter XII, it will receive extended 
attention. Intermediate between the histogram and the smooth 
frequency cur/e is the frequency polygon. The histogram is com- 
posed of a number of vertical bars. If the mid-points of the tops of 
these bars are connected by straight lines, the result is a frequency 
polygon. Figure XXXII illustrates this point. 

A polygon is commonly defined as a geometrical figure having 
more than four angles. The angles of the above polygon are lo- 
cated at the mid-points of the ''tops of the vertical bars and are 
formed by the straight lines which connect the mid-points. It will 
be noticed that the form of the frequency polygon emphasizes 
better than the histogram the concentration of children at the 
middle age period and also the approximately symmetrical dis- 



STATISTICAL ANALYSIS 


169 



Figure XXXII. — Distribution of Children in the Eighth Grade, St. 
Louis Public Schools, by Ages, Showing the Relations Between a His- 
togram and a Frequency Polygon 


1 7 o 


SOCIAL STATISTICS 



Figure XXXIII. — Distribution of Children in the Eighth Grade, St. 
Louis Public Schools, by Ages, Comparing the Frequency Polygon and 
the Smoothed Frequency Curve 


STATISTICAL ANALYSIS 


171 

tribution of children below and above the point of concentration. 5 
If the purpose is further to emphasize the symmetry of the dis- 
tribution, the frequency polygon may be smoothed by drawing a 
free-hand line around the polygon. This is illustrated in Figure 
XXXIII. 

This smoothed frequency polygon is still not entirely symmetri- 
cal. It bulges a little on the left side, and it is pushed inward a little 
on the right side. There are two possible explanations of why these 
data distribute themselves in this slightly asymmetrical way: this 
may be the normal distribution of children of different ages in the 
eighth grade, or an artificial factor may be producing the asym- 
metry. A much larger number of children might tend to remove 
the apparent asymmetry. Before drawing any conclusion regarding 
the natural distribution of such data, that is, before one can de- 
termine the statistical law describing them, further experimenta- 
tion is necessary. Nevertheless, it is interesting to place a 
symmetrical curve on the polygon in Figure XXXII to see in 
what respects it differs from the actual distribution. This symmetri- 
cal curve, when smoothed by proper methods, is known as an ideal 
frequency curve. Chapter XII, which deals with the theory of 
probability, will describe methods for fitting an ideal curve to any 
frequency distribution of this general type. For the present it will 
suffice to see how the two curves look when superimposed in 
Figure XXXIV. 

If the children were evenly distributed on both sides of the 
arithmetic average, the distribution would be entirely symmetrical 
and would be represented by the broken curve in Figure XXXIV. 
In the three preceding charts it will be apparent that there is a 
gradation from the histogram to the ideal frequency curve. The 
histogram is the simplest representation of the data. The frequency 
polygon is still close to the original data, but the free-hand 
smoothed frequency curve is a step further away from the data. 
The ideal frequency curve is entirely theoretical j it represents the 
form which the distribution of the 4,721 children would take if 
they conformed to the ideal distribution. It is useful for com- 
paring the departure of the actual data from the ideal distribution, 
or, as it is sometimes called, the “normal probability” curve. If a 

c In this as in other line graphs, the left end of the base line along which the 
horizontal scale is measured off is referred to as the lower end of the scale, 
and the right end of the base line is correspondingly referred to as the upper 
end of the scale. The terms “below” and “above” the point of concentration 
in the frequency distribution are used in the same manner. 



172 


SOCIAL STATISTICS 


larger number of children, in this case, were used and their dis- 
tribution approached nearer and nearer the ideal frequency distri- 
bution, it might be correct to say that the ideal curve is the general 

Y=NUMBER OF CHILDREN 



Figure XXXIV. — Distribution of Children in the Eighth Grade, Com- 
paring the Frequency Polygon with tjie Ideal Frequency Curve 

statistical law which describes the data. Then any particular study 
of the age distribution of eighth grade children which failed to 
approach the ideal curve would obviously be an unrepresentative 
study, either because not enough children had been included or 



STATISTICAL ANALYSIS 173 

because some other factor had entered into the situation to skew 
the curve. 

The next frequency chart shows how the component parts of a 
population problem may be represented graphically. This chart is 
taken from a study of school attendance covering the entire United 
States. Do all of the major population groups analyzed according 
to nativity and race show the same percentage of children in school 
at different ages? When the data were assembled and analyzed, 



YEARS OF AGE 

Figure XXXV. — Per Cent of Males Attending School Among the 
Native White, Foreign-Born White, Negro, and “All Other” Popula- 
tion 5 to 20 Years of Age, by Specified Age: i92o 5tt 

differences were found. Figure XXXV shows the situation. 
The native white population shows the highest school attendance 
until after the fifteenth birthday. From that point on the “all 
other” ranks highest. The Negro group is generally low, but after 
the fifteenth birthday the foreign-born white children drop below 
all the others. The chart makes possible comparisons at any age 
between any two or more population groups, and as a whole gives 
a general impression of how these groups attend school. 

It will be found in practice that only a small proportion of 

Ba Ross, Frank A., School Attendance in the United States , 1920, p. 8. United 
States Bureau of the Census, 1924. 



174 


SOCIAL STATISTICS 


series of social data are distributed in the approximate form of the 
ideal frequency curve. Most frequency distributions in the field of 
social statistics will lack the perfect symmetry exhibited by the 
bell-shaped curve. They will not be nearly so symmetrical as the 
frequency polygon constructed from data for the age distribution 
of eighth grade children. In fact, they will be noticeably asym- 
metrical y that is, they will look as if they had been pushed toward 
one side or the other. Using normal in the sense of average, these 
asymmetrical distributions may be normal for the data used. It 
may be possible to fit a smoothed, or generalized, curve to these 
asymmetrical distributions to which all sample studies would 
closely conform. Table XX gives the number of cities in the 
United States which had increased less than 120 per cent be- 
tween 1920, and the time of the 1930 census, distributed in 10 
per cent class-intervals: 

TABLE XX 

336 Cities in the United States, with 25,000 or More 
Population, Which Increased Less Than 120 Per Cent 

BETWEEN 1920 AND I 93 O 


Percentage Number of 

Increase Cities 


Total 33 6 


Under io 69 

10- 19 78 

20- 29 62 

30- 39 41 

40 - 49 23 

50- 59 20 

60- 69 7 

70- 79 16 

80- 89 4 

90- 99 6 

1 00-109 6 

110-119 4 


The most common rate of increase lies between io and 20 per cent. 
Few cities showed an increase of over 40 per cent. It should be 
noted that a few cities lost population, while a few had increases 
greater than 120 per cent. The number which had decreases is 
about equal to the number which had more than 1 20 per cent in- 
creases j so for convenience in constructing the chart these extremes 
were omitted. But they would have to be included if one were 
computing the average change in population of this class of cities. 
For the data presented the concentration is at the lower end of 



STATISTICAL ANALYSIS 


Y= CITES 



Figure XXXVI. — 336 Cities in the United States, with 25,000 or More 
Population, Which Increased Less Than 120 Per Cent Between 1920 and 
1 93 ° 



SOCIAL STATISTICS 


176 

the scale. For some other series, such as the age distribution of 
persons dying of heart disease, the concentration would be at the 
upper end of the scale. 

Figure XXXVI has one technical difference from preceding 
charts. It shows percentage as the independent variable and, hence, 
plotted on the horizontal scale. In general percentage will be the 
dependent variable, but in this case the maximum percentage is 
arbitrarily fixed, and the class-intervals are fixed. But the number 
of cities in each class is not fixed j’ the only way it can be determined 
is to count the cities falling into each class frequency. This number 
varies according to the length of the class-interval, which here is 
10. In this problem we are concerned with the frequency of cities 
in percentage groups j this fact determines which is the independ- 
ent and which the dependent variable. 

8. MISCELLANEOUS GRAPHIC DEVICES 

Besides such curves as have been discussed, there are a great 
many devices used for graphic presentation. Some will be illus- 

0 100 200 300 


1890 


1920 

Figure XXXVII. — Representing the Percentage of Change in Population 
of Indianapolis from 105,436 in 1890 to 314,194 in 1920 


trated at this point, but before giving the illustrations a caution 
must be mentioned. It was pointed out earlier that graphic meth- 
ods are ways of translating concrete data into symbols expressed as 
lines, surfaces, and sometimes cubes. Where equations are pre- 
sented along with the graph, any of these geometrical concepts 
may be employed, though surfaces are more apt to be misleading 
than lines, and cubes more difficult to utilize with clarity than 
surfaces. It is easier for the eye to grasp the relative size of two 
lines of varying lengths than two surfaces of varying areas or two 
solids of varying cubic content. 



STATISTICAL ANALYSIS 


177 


For example: The city of Indianapolis increased in population 
from 105,436 in 1890 to 314,194 in 1920, almost exactly tripling 
the population. Let us represent the growth of the city, first, by 
narrow bars (a bar is a narrow space enclosed between two lines 

17.3 


10.0 


1890 

100 # 


1920 

298# 


Figure XXXVIII. — Representing Percentage Change in Population of 
Indianapolis from 105,436 in 1890 to 314,194 in 1920 by Means of Areas 


the length of which is so obvious that width is neglected by the 
eye, and not used in fact); second, by squares; third, by cubes. 

Although it might not be known what the exact population was 
in 1890 and 1920, a glance at Figure XXXVII would immedi- 

6.7 


4.6 


1920 

298# 


1890 

100 # 


Figure XXXIX. — Representing Percentage Change in Population of 
Indianapolis from 105,436 in 1890 to 314,194 in 1920 by Means of Cubes 

ately suggest that in the thirty-year period the population had just 
about tripled. But an examination of Figure XXXVIII, in which 
the area of the large square is three times that of the small square, 
does not suggest to the eye a tripling of the population; the second 



SOCIAL STATISTICS 


178 

square does not appear to be three times the size of the small one. 
When cubes are used in Figure XXXIX, the relative magnitudes 
of the two cubes are still less obvious. Sometimes a small man and 
a large man are used to illustrate growth in population, but this is 
a special case of the cubic representation, because the figure of a 
man is three dimensional, albeit irregular in contour. The illusion 
would be present, even though only the height of the men were 
intended for comparative purposes, because the other dimensions 
of the large man would give the effect of less height than he 
possessed in fact. Clearness will be enhanced in graphic presenta- 
tion of this sort if comparisons are made by one dimension only. 
There may be special cases where the square, the rectangle, or the 
cube is most satisfactory, but they do not occur often. 

The bar chart, or the column chart which differs from the bar 
chart only in the fact that the bars are erected vertically on the 
base line, is one of the simplest and most easily understood of all 
graphic devices. Instead of using only two years, as in Figure 
XXXVII, one may use the bar chart to compare the population at 
the time of the census in several different years. For non-technical 
and non-functional presentation of statistical data the bar chart is 
widely used. It does not take the place of more refined statistical 
analysis but is satisfactory for presenting the elementary implica- 
tions of some data. 

Variations of the bar chart are the hundred-per-cent chart, 
Figure XL, and the double-bar chart, Figure XLI, below: 

89.7 9.9 .4 


White Negro Other 

Figure XL. — Percentage of the Population of the United States 
Represented by Each Race, 1920 

TABLE XXI 

Percentage of the Population of the United States Repre- 
sented by Each Race, 1920 1 


Race 

0 

Percentage of 
Population 

White 


89.7 

Negro 


9-9 

Indian 


.2 

Chinese 


.1 

Japanese 


.1 


1 Abstract of the Census , 1920, 




STATISTICAL ANALYSIS 


179 


The percentages of Indians, Chinese, and Japanese are so small 
that they were combined and represented as “other.” The total 
length of the bar is 100 per cent. Hence, the division into parts 
representing the proportions of different races in the population 
shows the relative importance of the races. 

Figure XLI illustrates the use of the double-bar graph. The 
data are taken from criminal statistics and are given in Table 
XXII: 


TABLE XXII 

Percentage of White and Necro Races Among the Commit- 
ments to Prisons and Reformatories, 1910 and 1923J 




Commitments 

in Per Cent 



1910 

1923 

All 


99-2 

98.2 

White 

Negro 


66.3 

32.9 

74-2 

24.0 


1 Prisoners, 1923. Report of the United States Bureau of the 
Census. 


1910 


1923 


jNEGR0"ii!9% 


| WHITE 




Figure XLI. — Percentage of White and Negro Races Among the 
Commitments to Prisons and Reformatories, 1910 and 1923 


This graph presents the relative percentages of commitments of 
whites and Negroes to prisons and reformatories in 1910 and 1923 
and brings out the point that Negroes have declined in proportion 
to whites in commitments to penal institutions of the United 
States. 

A variation in bar charts is shown below. The purpose of this 
graph is to compare the percentage distribution by ages of the 
total population of the United States in 1920 and of the gainfully 
employed in the same year to emphasize the age-group variations. 
This kind of bar chart makes clear the relation of the occupied 
group to the total population. Few persons under 16 years of age 
are employed, but in the next two age groups the percentages 
gainfully employed are higher than the corresponding percentages 



i8o SOCIAL STATISTICS 

TABLE XXIII 

Age Distribution of the Population over io Years of Age 

AND OF THE GAINFULLY EMPLOYED OF SIMILAR AGES EXPRESSED 

in Percentage 1 


Age 

Percentage of 
Population 

Percentage of Gain- 
fully Employed 

All 


100.0 

10-15 

14-9 

2.5 

i 6-44 

57-5 

69.4 

45-64 

21.5 

23.8 

Over 65 

5-9 

4-1 

Unknown 

.2 

.2 

1 United States Census of Occupations, 1920. 


Age Per cent 

>«» ‘Sf 


.... 57.5%E 

10-44 69.4% t 


EZZS ^ Population I i Employed 

Figure XLII. — Age Distribution of the Population and of the Gainfully 
Employed Over io Years of Age 

of the total population above io years of age in these two age 
groups. In the upper age group the percentage employed falls 
below the corresponding percentage of the population. This kind 
of chart may be used to advantage in comparing the age distribu- 
tion of persons who receive the services of social agencies with simi- 
lar age groups in the population. 

The circle, or sector, chart (otherwise called the pie chart) 
resembles both the surface chart and the hundred-per-cent chart 
in that the area bounded by an arc and two radii is used and that 
this area is a part of the whole circle which is 100 per cent. This 
kind of surface chart does not have the disadvantages of the 
quadrilateral or the triangle, because the whole circle is conceived 
as representing all the data, and the sectors are parts of this whdle. 



STATISTICAL ANALYSIS 


1 8 1 


This relationship brings out the relative magnitude of each di- 
vision. Figure XLIII illustrates this type of graph: 


40-60 Years 
33.9* 



Under 20 Years 
12.3* 


Figure XLIII. — New Commitments to Indiana Hospitals for the Insane 
by Age Groups, Year Ending September 30, 1929 


TABLE XXIV 

New Commitments to Indiana Hospitals for the Insane by 
Age Groups, Year Ending September 30, 1929 1 


Age 

New Commitments 

Per Cent 

All 

1,642 

100.0 

Under 20 

. . . . 201 

12.3 

20-40 

552 

33-6 

40-60 

557 

33-9 

Over 65 and Unknown. . . 

332 

20.2 


1 Indiana Bulletin of Charities and Corrections , No. 182, p. 185. 


The division of the circle into parts showing each age group’s pro- 
portion of the total commitments makes clear the age groups from 
which the hospitals for the insane draw most of their patients. 
It is probably a better form than the hundred-per-cent bar, and 
it does not have the objectionable features characteristic of rec- 
tangular surfaces. 

Geographic data are often satisfactorily presented by the use of 



182 


SOCIAL STATISTICS 



Figure XLIV. — Location of Felonies, January to June, 1929 




















STATISTICAL ANALYSIS 


183 

a map of the area from which the data are taken 5 this is divided 
into small subdivisions, such as states for the United States, coun- 
ties for a state, townships for a county, or wards or census tracts 
for cities. Such maps are used in the so-called ecological studies 
of social problems which have been made in Chicago and else- 
where. Since the Bureau of the Census began tabulating some of 
the population data for cities by “census tracts,” the use of maps 
to present the distribution of disease, crime, poverty, etc., has 
greatly increased. The population of the tracts is small and usually 
highly homogeneous as to race, nationality, economic status, age, 
and sex. If the data of crime, disease, and poverty are. distributed 
by census tracts, it is possible to make important studies of the 
occurrence of social-problem phenomena. To a lesser extent coun- 
ties may be used to study problems on a state-wide basis. Figure 
XLIV is a good illustration of this use of the map. (See p. 182.) 
Figure XLIV shows the percentage of convicted felons whose 
crimes were committed in each census tract of Indianapolis from 
January to June, inclusive, 1929. Tracts 56 and 78 include the 
main business part of the city, and it will be noticed that 20.97 
per cent of all felonies were committed in these two tracts. Some 
other contiguous tracts also had high rates of crime. It is clear that 
police protection should be concentrated in tracts 56 and 78 and 
to a lesser extent in a few other tracts. When these data are related 
to other facts obtained by the census, additional inferences of 
importance may be made. 6 

A variation of the cartogram is shown in the next chart. This is 
taken from a recreation study made in Indianapolis: 


TABLE XXV 

Distribution of Homes of Children Using a Public Play- 
ground 1 


Distance of Home from 

Number of 

Percentage 

Playground 

Homes 

of Homes 

All 

146 

100.00 

Under yi mile 

86 

58.90 

mile 

36 

24.65 

Over # mile 

15 

10.27 

No address 

9 

6.18 


1 Lies, Eugene T., The Leisure of a People , Indianapolis Council 
of Social Agencies, 1930, p. 132. 


* Data for this map were collected by the writer but have not been published. 



SOCIAL STATISTICS 


184 

The investigator wanted to determine the soundness of the 
present distribution of playgrounds in Indianapolis from the point 
of view of maximum use. For three days he adopted the plan of 
putting some one in the playground to tag every child and to 
learn his home address. Then each home was indicated by a dot 
on the map. When this was completed, he drew a circle with a 



Figure XLV. — Distribution of Homes of Children Using a Public Play- 
ground Shown by One Dot for Each Home and by Concentric Circles 
of a Quarter-Mile and a Half-Mile Radius 


radius of a quarter of a mile from the center of the playground. 
Then a concentric circle with a* half-mile radius was drawn. The 
homes inside the first circle were counted 5 then those lying out- 
side the first but inside the second circle were counted. This method 
gave a basis for estimating how far children would go to reach a 
playground and where future playgrounds ought to be located. 
The type of cartogram used by Mr. Lies has great usefulness, 



ITE POPULATION IN CHURCH 


STATISTICAL ANALYSIS 


>85 



Figure XLVI. — Percentage of the White Population in Counties of Virginia Who Belong to Churches 1 
1 Hamilton, C. Horace, and Garnett, William E., “The Role of the Church in Rural Community Life in Virginia,” Virginia Agricul- 
tural Experiment Station, Bulletin 267 , 1929, p. 11. 




SOCIAL STATISTICS 


1 86 

especially in presenting the report of a community survey which 
is to be read by a great many people of differing interests and 
varying amounts of time available for studying the bulky text of 
the report. The charts not infrequently “sell” the survey and its 
recommendati ons. 

For the purpose of showing certain general social facts about a 
state and indicating variations in different parts, a cartogram on 
a county basis is useful. Figure XLVI was constructed to show the 
percentage of church membership among the white population in 
each county of Virginia. 

The variations are marked, ranging from less than 25 per cent 
of the population in 9 counties to 66 per cent or more in 7 counties. 
A technical criticism might be made of this chart on the ground 
that it shows too much. The percentages of “independent cities” 
are a little confusing, and also the figures boxed in the upper 
right-hand corner are not immediately clear. If only the map, the 
title, and the legend had been given, the import would have been 
obvious. A little study soon clarifies the meaning of the percent- 
ages, however. It is a chart which would make anyone interested 
in the church as a useful social institution stop and ask questions 
and ponder the meaning of the wide variations in apparent interest 
in the church. 

Diagrammatic presentation of the plan of organization of an 
agency or institution is widely used. Governmental organizations, 
corporations, and social agencies are often complicated in their 
structure. It is almost impossible for anyone, even an official, to 
visualize the detail of the organization of a federal department, 
unless his imagination is aided by some graphic means. Also, it is 
difficult to grasp the ramifications and divisional relationships of 
a large city school system. But a chart sets these out clearly. Fig- 
ure XLVI I shows the organization of the attendance work of the 
New York City school system. 7 

9. STANDARD RULES OF GRAPHIC PRESENTATION 

The Joint Committee on Standards of Graphic Presentation, 
composed of representatives of fifteen scientific societies and two 
government bureaus, worked out the more generally accepted 
rules for graphic presentation and published their report in the 
Quarterly Publication of the American Statistical Association , De- 

7 United States Children’s Bureau, Publication No . if, 



STATISTICAL ANALYSIS 


187 


Population 

100 , 000,000 


80,000,000 


1. The general arrangement of 
a diagram should proceed from 
left to right. 


40.000. 000 

20 . 000 . 000 


e 

5 

CO 


5 S 


Year 


Illustration 1 


Year 

Tons 

1900. 

270,588 

1904. 

555,031 




Illustration 2 


2 . Where possible represent quantities by linear magnitudes, 
as areas or volumes are more likely to be misinterpreted. 


3. For a curve the vertical scale, 
whenever practicable, should be so 
selected that the zero line will 
appear on the diagram. 


Sales 

$1000 

900 

800 

700 

600 

500 

400 

300 

200 

100 

0 . 



3 4 5 6 7 8 9 101112 
Months 

Illustration 3 



1 88 


SOCIAL STATISTICS 


Per Cent 



;ef 


B 




01234567 

Hour 

Illustration 4 


4. If the zero line of the ver- 
tical scale will not normally appear 
on the curve diagram, the zero line 
should be shown by the use of a 
horizontal break in the diagram. 


Population 

R.P.M. 

100 ,000,000 

700 


600 

80,000,000 



500 

60,000,000 

400 


40.000. 000 

20.000. 000 

0. 



300 

200 

100 

0 


5 10 151 20 25 30 35 
MilesTper Hr. 



Illustration 5a 


Illustration 5b 


Gain 

or 

Loss 



Illustration 5c 


5. The zero lines of the scales 
for a curve should be sharply dis- 
tinguished from the other co- 
ordinate lines. 



STATISTICAL ANALYSIS 


i 


6. For curves having a scale 
representing percentages, it is 
usually desirable to emphasize in 
some distinctive way the ioo per 
cent line or other line used as a 
basis of comparison. 


Per Cent 
Utilized 
100 
90 
80 
70 
60 
50 
40 
30 
20 
10 
0 . 


Year 

Illustration 6a 


Relative 

Cost 

104 

103 

102 

101 

100 

99 

98 

97 


Year 

Illustration 6b 



7. When the scale of 
a diagram refers to 
dates, and the period 
represented is not a 
complete unit, it is bet- 
ter not to emphasize the 
first and last ordinates, 
since such a diagram 
does not represent the 
beginning or end of 
time. 


Population 

100,000,000 

80,000,000 


60,000,000 

0 , 000,000 

20,000,000^ 



0 


o 

o 


Year 

Illustration 7 




190 


SOCIAL STATISTICS 


Population 

100,000,000, 


8. When curves are drawn on 
logarithmic coordinates, the limit- 
ing lines of the diagram should 
each be at some power of ten on the 
logarithmic scales. 


Year 

Illustration 8 



Illustration 9a Illustration 9b 


9. It is advisable not to show any more coordinate lines than 
necessary to guide the eye in reading the diagram. 


Population 

100 , 000,000 

80,000,000 

60,000,000 

40.000. 000 

20 . 000 . 000 

0 



10. The curve lines of a dia- 
gram should be sharply distin- 
guished from the ruling. 


Year 

Illustration 10 







STATISTICAL ANALYSIS 


IOI 


ii. In curves representing a 
series of observations, it is ad- 
visable, whenever possible, to 
indicate clearly on the diagram 
all the points representing the 
separate observations. 


Analysis 



12. The horizontal scale for 
curves should usually read from 
left to right and the vertical 
scale from bottom to top. 


Population 



Pressure 
Lbs. per Sq. In. 



Illustration 11c 


Population 



Illustration 12 



192 


SOCIAL STATISTICS 


Population Gain or 

100,000,000 *- oss 



Illustration 13a 


Illustration 13b 



13. Figures for the scales of a 
diagram should be placed at the 
left and at the bottom or along the 
respective axes. 




Illustration 14 a 



Illustration 14b Illustration 14 c 


14. It is often desirable to include in the diagram the numerical 
data or formulae represented. 





STATISTICAL ANALYSIS 


193 


15. If numerical 
data are not included 
in the diagram it is 
desirable to give the 
data in tabular form 
accompanying the 
diagram. 


Population 



c 


Year 

Population 

1840 

1850 

1860 

1870 

1880 

1890 

1900 

1910 

17 , 069.453 

23 , 191,876 

31 , 443,321 

38 , 558,371 

50 , 155,783 

62 , 622,250 

75 , 994,575 

91 , 972,266 



1 6. All lettering and 
all figures on a diagram 
should be daced so as to 
be easily ead from the 
base as the bottom, or 
from the t-hand edge 
of the diagram as the 
bottom. 


17. The title of a diagram should 
be made as clear and complete as 
possible. Subtitles or descriptions 
should be added if necessary to insure 
clearness. 



1 2 3 4 5 6 7 8 9101112 
Month 


Illustration 17 


Aluminum Castings Output 
of Plant No. 2, by Months, 
1 9 1 4.. 

Output is given in short 
tons. 

Sales of Scrap Aluminum 
are not included. 




194 


SOCIAL STATISTICS 


cember, 1915. These rules have been quite generally used and are 
reproduced here for reference purposes. 8 

TABLE XXVI 

Weighted Aggregates of Public Welfare Work and the 
Annual Trend Values of the Volume of Work, Indiana 
Board of State Charities, 1900 to 1927 1 

y Weighted Annual Trend 

ear Aggregates Values 


Average $126, 917 $126,917 

1900 $112,496 $110,583 

1901 112,899 111,867 

1902 112,560 113,151 

1903 111,116 114.435 

1904 111,656 115,719 

1905 “7.133 116,003 

1906 117,133 117,287 

1907 114,149 118,571 

1908 121,505 119.855 

1909 117.465 121,139 

1910 118,345 122,423 

1911 120,123 123,707 

1912 124,344 124.991 

1913 124,786 126,275 

1914 132,039 127,559 

1915 145,673 129,843 

1916 140,469 131,127 

1917 141,639 132,411 

1918 126,688 133,695 

1919 121,428 135,979 

1920 117,109 137,263 

1921 129,862 138,547 

1922 136,027 139,831 

1923 126,126 141,115 

1924 135,807 142,399 

1925 146,468 143,683 

1926 152,806 144,967 

1927 160,566 146,251 


1 Weighted aggregates are the sum of persons aided per 100,000 
population by each Indiana agency or institution multiplied by the 
median cost per person for the respective agencies. Data from un- 
published manuscript by the author. 

8 Quarterly Publication of the American Statistical Association, Vol. 14, pp. 
790-797. The following scientific societies and government bureaus had repre- 
sentation on the Joint Committee: American Society of Mechanical Engineers, 
at whose invitation the Committee was formed, American Statistical Association, 
American Institute of Electrical Engineers, American Association for the Ad- 
vancement of Science, American Academy of Political and Social Science, 
American Genetic Association, American Economic Association, United States 
Bureau of the Census, United States Bureau of Standards, American Associa- 
tion of Public Accountants, American Chemical Society, American Institute of 
Mining Engineering, American Psychological Association, Actuarial Society of 
America, and the Society for the Promotion of Engineering Education. 



STATISTICAL ANALYSIS 


>95 


10. EXERCISES 

1. Construct a straight line graph on the natural scale from the 
annual trend values given in Table XXVI. 

2. Compute the annual growth of $1,000 at 6 per cent interest 
for a period of io years and construct a straight line graph on 
the natural scale for the annual amounts of the principal plus 
the simple interest accrued. 

3. Tables XXVII and XXVIII give the population per square 
mile in continental United States, excluding Alaska, from 1790 
to 1930 and patients per 100,000 population in Indiana hospi- 
tals for the insane on the last day of the fiscal year from 1900 
to 1927. Plot these data: 

(a) on the natural scale; 

(b) on the semi-logarithmic scale. Explain the differences 
and the significance of each curve. 

4. Tables XXIX and XXX give cumulative data. Make a chart 
from the data in each table and interpret the meaning of the 
charts. 

5. Make bar charts representing the data in Tables XXXI and 
XXXII. 

TABLE XXVII 

Population per Square Mile in Continental United 
States, Excluding Alaska, 1790 to 1930 


Year 


Population per 
Square Mi e 


1790 

1800 

1810, 

1820. 

1830. 

1840. 

1850. 

i860. 

1870. 

1880. 

1890. 

1900. 

1910. 

1920. 

1930. 


4 - 3 

5 - 5 
7-3 
9-7 
19 

10.6 
13.0 

16.9 

21.2 

25.6 

30.9 
35-5 

41.3 



196 


SOCIAL STATISTICS 

TABLE XXVIII 

Patients per 100,000 Population in the Indiana Hospitals for 
the Insane on the Last Day of the Fiscal Year, 1900 to 1927 1 


Year 


Patients per 100,000 
Population 


1900 

1901 

1902 

1903 

1904 

1905 

1906 

1907 

1908 


1909- 

1910. 

191 1 . 

1912. 

1913- 

I 9 H. 

191 $- 

1916 . 

1917. 

1918 . 

1919. 

1920. 
1921 . 

1922 . 

1923. 

1924. 

1925. 

1926. 

1927. 


173-4 
173 9 

179.9 

178.2 

188.3 

192. 1 
195 9 

188.0 

188.9 

192.6 

198.0 

200.6 

212.3 

215.6 

216.9 

219.3 

219.9 

220.9 

207.4 

207.9 

206.6 

210.5 

217-3 

218.2 

217-3 

223.3 
226.2 

225 . 1 


1 From unpublished manuscript by the author. 

TABLE XXIX 


Cumulative Percentages of the Budget ($72,000) Expended 
by a Charitable Agency, Fiscal Year 1929-30, Compared 
with the Estimated Average Monthly Budget Requirements 



Per Cent of Budget, 

Per Cent of Budget, 

Month 

Average 

Expenditures in 


Requirements 

1929-30 

November 

7-3 

10.4 

December 

17-4 

26.8 

January 

29.5 

47-7 

February 

41-5 

68.4 

March 

• *>... 52.8 

87.4 

April 

May 

61.5 

103.0 

68.9 

115.1 

June 

75-3 

124.2 

July 

81.7 

133-6 

August 

88.0 

I4I .6 

September 

93-8 

1483 

October 

100.0 

IS6.S 



STATISTICAL ANALYSIS 
TABLE XXX 

Cumulative Percentages of Males in the Population of 
Indianapolis and of Males Employed by Six Indianapolis 
Firms by Age Groups 


Per Cent of Males, Per Cent of Males 


Age City, Employed by Six 

1920 1 Firms, 1930 2 


15-19 io -3 4-8 

20-24 23.0 23.7 

25-29 3 6 -5 42.2 

30-34 48.9 57-7 

35-39 61.2 70.8 

40-44 71.0 80.5 

45-49 80.2 87.2 

50-54 87.8 92.8 

55-59 929 966 

60—64 9^ ■ 8 98 • 8 

65-69 100.0 100.0 


1 United States Census, 1920. 

2 Unpublished manuscript by the author. 


TABLE XXXI 

Percentage of Urban and Rural Population in the United 
States, 1890 to 1930 J 


Urban Population, Rural Population, 
Per Cent of Total Per Cent of Total 


1890 35.4 64.6 

1 900 40.0 60.0 

1910 45.8 54.2 

1920 5 1 - 4 48.6 

1930 56.2 43.8 


1 United States Census, 1920, and Population Bulletin , First 
Series , 1931. 


TABLE XXXII 

Percentage of Total Persons Receiving Poor Relief in Poor 
Asylums and from Township Trustees (Outdoor Relief) in 
Indiana in Specified Years 1 


Poor Asylums, Township Trustees. 
Per Cent of Total Per Cent of Total 


190° 6.3 93.7 

1905 6.4 93.6 

1910 6.7 93.3 

1915 3-4 96.6 

1920 6.5 93.5 

1925 46 95.4 


197 


1 Indiana Bulletin of Charities and Corrections , No. 182, 



198 


SOCIAL STATISTICS 


6. Make sector charts representing the data in Tables XXXIII 
and XXXIV. 

TABLE XXXIII 

Inmates in State Penal and Correctional Institutions per 
100,000 Population, September 30, 1929 1 * * * V 

Number per 100,000 
Population 


Felons 126.6 

Misdemeanants 41 .7 

Juveniles 25.9 


1 Indiana Bulletin , No. 182. 

TABLE XXXIV 

Expenditures of the State Government of New York by 
Groups, Percentage Going to Each, 1920 1 


Group of State Percentage of 

Expenditures Expenditures 

All TOO. 1 

Social 47.6 

Protection 16.5 

Administration 1 1 . 4 

Construction 24.6 


1 Clark, Harold F., The Cost of Government and the Support of 
Education , p. 29. Teachers College, Columbia University, 1924. 

7. Using data from the census of 1930, construct a cartogram of 
your state showing the percentage of foreign-born population 
in each county. 

8. Using data from the census of 1930, construct 
(a) a histogram, and 

.(b) a frequency curve of the age distribution of the population 
of the United States. 

II. REFERENCES 

Chaddock, Robert E., Principles and Methods of Statistics , Chap. 
XVI. 

Lovitt, William V., apd Holtzclaw, Henry F., Statistics , Chaps. 

V and VI. 

Mills, Frederick C., Statistical Methods y Chap. II. 

Mudgett, Bruce D., Statistical Tables and Graphs , Chaps. II and 

III. 

Whipple, George C., Vital Statistics , Chap. II. 



CHAPTER VIII 


Measures of Central Tendency 


I. INTRODUCTION 

The measure of central tendency, or the average, of a number of 
observations is probably the most commonly used method of sta- 
tistical analysis. Almost any person with a common school educa- 
tion can think in terms of an average individual selected from a 
collection of similar individuals. But such a person may not know 
how to compute any measure of central tendency for the collec- 
tion. Furthermore, it is necessary to know what kind of measure 
of the central tendency of the data is best for the purpose in mind. 
In colloquial language “average” is almost synonymous with 
“usual” or “most common.” Among those who are familiar with 
statistical language, it generally means the arithmetic average. 
But there are several kinds of averages, and in order to empha- 
size the fact that they represent much the same thing, though 
differing in size and quality, the averages are referred to in this 
chapter as measures of central tendency — the tendency of the 
values of the individual items in any collection of data to cluster 
around some middle value. 

As used in statistics, an average is a quantitative concept. It 
implies that some trait of the individuals can be measured and 
that an average value can be found for the separate values of 
this trait observed in individuals possessing it. In practice an aver- 
age is computed for both variables and attributes, though strictly 
speaking the term should be used only in connection with true 
variables, that is, traits capable of being measured or counted. The 
central tendency of the values of the different items may. be ex- 
pressed by any of the averages, but in all cases it will be a 
quantity. 

Another characteristic of an average is that it is a value typical 
of the data from which it is computed. It may be the most com- 

199 



200 SOCIAL STATISTICS 

mon value actually found, as the mode; or the middle value in a 
series arranged from lowest to highest, as the median; or it may 
be a value from which the minus deviations and plus deviations 
are equal, as the mean; or it may be a variation of the mean 
arrived at by taking the product of all the items and extracting 
the appropriate root, as the geometric mean. In any case it is. a 
type value for the whole series. It can be used to represent the 
series in comparison with other type values of similar data. An 
average tells little about the individual items in a series; the actual 
variations of their values are disregarded, unless some method of 
relating variations to the average is used. Nevertheless, the typical 
value is useful as a shorthand description of the data, and is often 
an early step in much more complex statistical analysis. 

The concept of central tendency is empirical. Experience with 
a great variety of facts has led to the inference that facts of the 
same kind differ in magnitude below and above a certain value 
with a fair degree of symmetry, and that a value may be found 
which is typical of the entire series. The low magnitudes differ 
from this value by about the same amounts as do the magnitudes 
above it. The leaves on a tree differ in length, but the extremely 
short ones and the extremely long ones are relatively rare. The 
heights of soldiers in a regiment differ, but the very short soldiers 
and the very tall ones are few in number as compared with the 
great majority. Some data are found to have the average among 
the low values or among the high values, because the distribution 
of values over the whole range is “skewed” one way or the other. 
For example, the average age of felons is fairly low, as compared 
with the average age of the total population. Consequently, there 
are some extremely high variations of age from the average age 
of felons, but that fact does not discredit the concept of central 
tendency; it merely suggests that some measure of variation from 
the average should be used in connection with it. There is a cen- 
tral tendency in the age distribution of felons. The central 
tendency of the magnitude of similar material facts is a matter of 
observation, and mathematics has provided a method by which 
this central tendency /nay be measured with a fair degree of 
accuracy. That is, the similarities of members of an animal species, 
the recurrent positions of a celestial body with reference to an- 
other celestial body, the usual age of marriage in a population, the 
usual number of children in dependent families, and the common- 
est level of intelligence found among delinquent boys were no- 



STATISTICAL ANALYSIS 


201 


ticedj and later statistical methods were used to determine the 
average magnitude of a trait of a species, the average observed 
position of the celestial body, the average age of males or females 
at marriage, the average number of children in dependent fami- 
lies, and the average intelligence of a group of delinquent boys. 
This is just another way of saying that the statistical method of 
averages is a means by which order is introduced into everyday 
experience and ascertained results are substituted for impressions. 

An average is most significant when the data have a high degree 
of homogeneity. This is particularly to be emphasized in dealing 
with social data, because so many factors affect a datum to render 
it highly variant from other data of perhaps the same general 
type. Age, sex, nationality, and race are factors which must be 
considered in connection with the study of some other social factor, 
such as crime, because they may lower the degree of homogeneity 
and render an average of any sort meaningless. For example, in 
the study of crime the results are more dependable if juvenile 
delinquents are studied separately and if the sexes are studied 
separately. Where one or more nationalities enter into the situa- 
tion, it is usually desirable to consider them separately for the 
purpose of determining the nationality having the highest or 
lowest rate of crime. If a study of wages in a factory is being made, 
the study should be divided into analyses of wages for each sex, and 
wages for office workers and industrial workers. It so happens that 
by custom female workers get relatively less pay for similar work 
than male workers, and the office wage scale is generally quite 
different from the plant wage scale. Homogeneity of data is in- 
creased when such divisions of workers are made. The intelligence 
quotient of an individual is affected by the social class from which 
he comes. If he is taken out of a low social class and put into a 
higher social class, his intelligence quotient frequently rises, or, 
if a better social adjustment within his own social class is made, 
his intelligence quotient is known to rise. The average intelligence 
of all the children in a given school might have some meaning, 
but, if the children could be separated into the social classes to 
which they belong and the average intelligence of each social class 
obtained, the several averages would vary considerably, reflecting 
the heterogeneity of the total school population and showing the 
greater significance of average intelligence when it refers to one 
fairly definite social class. When primary data are to be collected 



202 


SOCIAL STATISTICS 


or secondary data are to be used, early consideration must be given 
to their homogeneity. 

Five averages are usually recognized. They are the mode, the 
median, the arithmetic mean, the geometric mean, and the har- 
monic mean. This chapter discusses all of these except the har- 
monic mean, but omits the latter because it is not often used. 
It is customary in books on statistics to discuss the averages in 
the following order: arithmetic mean, median, mode, and geo- 
metric mean, though some variations in the order do occur. Al- 
though the arithmetic mean is the form of average most used, it 
is not the concept most people have in mind when they use the 
term “average.” What they think of is the “usual” magnitude of 
a factor, and this is the concept of the mode. Hence, pedagogically 
it seems more appropriate to discuss the mode first, and then fol- 
low it with a discussion of the median, which resembles the former 
in one respect, namely, that it is also a position average. The 
arithmetic mean and the geometric mean obviously belong to- 
gether, rather than the arithmetic mean and the median, and of 
the two means the geometric is less well known and less useful. 
For these reasons, the order of discussing the averages will be as 
follows: mode, median, arithmetic mean, and geometric mean. 

Before turning to the methods of computing the averages, 
attention should be called to the variation of values around an 
average, otherwise known as deviation from the average, or dis- 
persion. Too great emphasis should not be placed upon the sig- 
nificance of an average, unless some measure of dispersion is used 
along with it. If the dispersion is small, the inference to be drawn 
is that the homogeneity of the data is high and the average re- 
liable; on the other hand, if the dispersion is great, homogeneity 
is low and the reliability of the average doubtful. Students of the 
social sciences and of social work are interested not only in the 
central tendency of a body of data but also in the dispersion of the 
individual values around the central tendency. Measures of dis- 
persion will be discussed in Chapter IX, but it is important for 
the student to realize at this point that an average requires these 
checks to make its significance clear. 

2. THE MODE 

The mode is the most common value occurring in a collection 
of data. It is the rule-of-thumb average which corresponds to the 
concept in the mind of the person untrained in statistics. The con- 



STATISTICAL ANALYSIS 


203 


cept may be made clearer, if the mode is referred to as the “fash- 
ion.” A fashion refers to the dominant trend of a certain kind of 



Figure XLVIII. — Distribution of Intelligence Among 451 Children in 
Dependent Families 

behavior, such as wearing empire hats or explaining behavior in 
terms of psychoanalysis. The difference between the statistical con- 



204 


SOCIAL STATISTICS 


cept of the mode and the concept of the fashion of the day lies 
in the fact that the mode is quantitative while fashion to a large 
extent is a qualitative concept, though in some instances it might 
be reduced to quantitative definition. Figure XLVIII will illus- 
trate the mode, using I.Q’s of children in dependent families. 
The highest point of the curve, which is on the ordinate at the 
mid-point of the class-interval marked I.Q. 75-85, indicates the 
mode. The value of the mode is in this class-interval, though the 
exact value of it cannot be determined from an examination of 
the diagram. Twenty-seven and six-tenths per cent of all the chil- 
dren had I.Q’s between 75 and 85, whereas by the same test the 
largest grouping of I.Q’s would theoretically be between 95 and 
105. This difference between the mode for the children in depend- 
ent families and those in an unselected group shows that children 
in dependent families rate lower in intelligence tests because of 
natural inferiority or because of social conditions. The modal 
class-interval enables us to make this comparison with the un- 
selected group. The “average” dependent child, as measured by 
the mode, had an I.Q. between 75 and 85. 

The above method of determining the mode is known as in- 
spection. But inspection may be used in other ways of finding the 
mode. The simplest method is the array. The items are arranged 
in ascending order from the lowest value to the highest, as shown 
in the following table: 

TABLE XXXV 

An Array of the Ages of ioo Felons Selected at Random 
from Cases Disposed of by the Marion County, Indiana, 

Criminal Court in 1930 


Age 

Age 

Age 

Age 

16 

18 

20 

22 

16 

18 

20 

22 

17 

18 

20 

22 

17 

19 

20 

22 

17 

19 

20 

22 

17 

19 

20 

22 

17 

19 

20 

23 

17 

19 

20 

23 

17 

* 19 

21 

23 

17 

19 

21 

23 

18 

19 

21 

23 

18 

19 

21 

24 

18 

19 

21 

24 

18 

20 

22 

25 

18 

20 

22 

25 

18 

20 

22 

25 



STATISTICAL ANALYSIS 


205 


TABLE XXXV— {Continued) 


Age 

Age 

Age 

Age 

26 

30 

35 

45 

26 

3 i 

36 

46 

27 

32 

38 

47 

28 

32 

39 

48 

28 

33 

41 

49 

28 

33 

43 

49 

29 

34 

43 

54 

30 

34 

43 

55 

30 

34 

45 

55 


The age group that is the most numerous is the modal group. In 
this array more felons are 20 than any other age. If these ages 
were presented as a frequency distribution in diagram form, it 
could easily be seen that the 20-year group is the largest; that is, 
this age is the most usual age of felons in this group. A. larger 
number of cases, however, might show that mode to be lo- 
cated in some other age group. 

The mode may be determined roughly from grouped data by 
successive regrouping. Table XXXVI illustrates this method: 

TABLE XXXVI 

Location of the Mode by Successive Regrouping of Ages of Felons 


Four-Year Group Six-Year Group 


Age 

2- Year 


4- Year 

Shift One 

6- Year 

Shift One 

Interval 


Interval 

r 

Interval 

({) 

Interval 

Interval 

(0 

& 

( 3 ) 

( 5 ) 

i) 

16-17 

10 

29 

(Omit 10) 


(Omit 10) 

18-19 

19 









45 


20-21 

16 


35 


49 



30 




22-23 

14 





24-25 

5 

8 

19 

22 


26-27 

3 

7 


12 


28-29 

4 

8 




30-31 

4 

8 

12 


32-33 

4 

8 


12 

34-35 

4 






206 SOCIAL STATISTICS 

TABLE XXXVI— {Continued) 


/*ge 

2- Year 


Four-Year Group 

Six- Year Group 


4- Year 

Shift One 

6-Year 

Shift One 

Interval 

t 

Interval 

f 

Interval 

f 

Interval 

Interval 

t 

(1) 

(2) 

(3) 

6 

Is) 

(6) 

36-37 

1 

3 


7 


38-39 

2 


3 


4 

4°~4 I 

1 

4 




42-43 

3 


5 

6 


44-45 

2 




7 

46-47 

2 

4 




48-49 

3 

3 

5 

5 


50-51 

0 


0 


3 

52-53 

0 

3 




54-55 

3 


(Omit 3) 

3 

(Omit 3) 


In each column of frequencies the largest is underlined to indicate 
the modal age group. It will be noticed that the size of the class- 
interval causes the modal group to shift. With a two-year interval 
the mode falls in the 18-19 year class, but in the four-year interval 
it falls in the 2023 year class. In the six-year interval it falls in 
the 16-21 year class. When the first and last class frequencies are 
omitted, as in columns (4) and (6), the mode is between 18 and 
21 years and 18 and 23 years, respectively. Because the mode is 
a position average, the omission of a few extreme items should not, 
and in fact does not, affect it. The class-interval 18-19 appears in 
four out of five of the groupings, and the class-interval 20-21 
appears in four. This would seem to indicate that the mode lies 
between these two groups, or at about 20 years, the latter being 
the mode determined from the array. The method of successive 
regrouping gives what Chaddock has called the crude mode. 

The mode may be computed from grouped data by means of the 
following formula: * 

Mo = / + 

Mo = the mode 

/ = lower limit of the class-interval having the largest num- 
ber of frequencies 




STATISTICAL ANALYSIS 


207 

fi = number of items in the class just below the modal group 
fa = number of items in the class just above the modal group 
i = size of the class-interval 

Using the two-year class-interval as a basis of computing the mode 
for data in Table XXXVI, the following substitutions in the for- 
mula are made: 


Mo = 18 + 


16 

16 + 10 


= 19.23 years 

If a four-year class-interval is used, the mode is 20.07. When the 
length of the class-interval changes, the number of items in each 
class varies also, and the effect is to shift the mode slightly. 

Pearson has suggested another formula for ascertaining the 
mode for data distributed in the form of a bell-shaped curve or 
only moderately skewed to the right or left. The formula is as 
follows: 


Mo = Mean — 3 (Mean — Median) 


In a perfectly bell-shaped curve, the three measures of central 
tendency are identical, but in moderately asymmetrical distribu- 
tions they differ by amounts among which there is a fairly constant 
relationship. This formula could not be applied to the felony data 
used above, because the age distribution is skewed far to the right 
in the direction of the higher ages. The constant relation required 
for the application of this formula among the mean, median, and 
mode does not exist in such highly asymmetrical distributions as 
ages of felons. 

All the methods for locating the mode so far discussed give 
approximations to it, but they do not give it exactly. The only 
exact method is to fit an ideal frequency curve to the actual fig- 
ures. 1 This method is complicated and is beyond the scope of the 
discussion at this point. Furthermore, the limited use to which the 
mode may be put does not often warrant the laborious calculations 
required to obtain it exactly. Methods giving approximations, as 
discussed above, are all that the student will ordinarily require in 
social statistics. The methods of arriving at the crude mode may 
be used with either continuous or discontinuous (discrete) data, 
but the exact method should be used only with continuous data. 

The mode has some advantages as a rough measure of central 
tendency. It marks the approximate location of the most common 
value in a series of data. And this may have practical significance. 

1 YuIe > op. cit p. 121. 



208 


SOCIAL STATISTICS 


It may be important to a court to know that the modal age of 
felons it has sentenced is about 20 years, whereas the mean age is 
considerably higher. In the study of wages it is sometimes impor- 
tant to know in what wage class the mode falls. Another advantage 
is that, being a position average, the mode is not affected by the 
addition or subtraction of an extreme item any more than it is by 
the addition or subtraction of an item near its own value. A third 
advantage is that the skewness of a distribution may be measured 
in terms of the mode. But the mode has its limitations as an aver- 
age: It is affected more than the mean by changes in the length 
of the class-interval; it cannot be exactly computed without resort 
to complicated and laborious methods of curve-fitting; it does not 
lend itself to the algebraic treatment that may be required in fur- 
ther statistical analysis; it cannot be used in connection with time 
series, because the high points on such a curve represent abnormal 
and not modal conditions. The student should study his data 
carefully. If it appears that the mode is really a significant meas- 
ure of the data, he should determine the mode and make use of it 
in his interpretation. Whether or not the mode is useful in a 
given case will depend upon the data themselves and upon the 
purpose of the investigator. The use of the mode as a method of 
statistical analysis is not a purely mechanical matter; it is a means 
to an end, and, unless determination of the mode throws light on 
the problem, there is no point to finding it. 

3. THE MEDIAN 

. The median is the middle value in a series of data, when they 
are arranged in ascending order from lowest to highest. It may 
fall on an actual item in the series or it may lie between two items. 
Like the mode, it is a position average and is not seriously affected 
by the addition or subtraction of an item, large or small. In a 
series of data the median is the value above and below which the 
numbers of items are equal. The chances are even that, if an item 
is selected at random from the series, it will be greater or less than 
the median. 

The median may be> found for both ungrouped and grouped 
data. If their number is small, the items can be arrayed and the 
median found by inspection. In any series the first step is to locate 
the position of the middle value. Take the following numbers: 
4, 6) 7 > 9> ir 5 x 5- They are odd, and the median value is 9. 
But if another item is added above 15, the median value then falls 
between 9 and 1 1. Referring to Table XXXV, how can the median 



STATISTICAL ANALYSIS 209 

position be located? This table contains an even number of items: 
100. The median value lies between the 50th and 51st items. For 
an even series, the median position may be located by this simple 
formula: 


2 

_ 100 + 1 
2 

= 50.5, the median position 
Md v — the median position 

N = number of items in the series 

In case of an odd number of items, the same formula is used, but 
the median position will be the position of the item standing half- 
way between the two extreme items, and will be a real item. In 
the arrayed items of Table XXXV, the median lies between the 
50th and 51st items, both of which happen to be 22 years. Hence, 
the median by inspection is found to be 22 years. If the median 
position had been between two values of unequal size, a further 
step would be necessary. Suppose the median had fallen between 
22 and 23 years. Then the procedure would be to add the two 
values and divide by 2 which would give 22.5. 

When a series contains a large number of items, the data are 
generally grouped into class-intervals. The following table shows 
the age distribution of male workers in Boston who were unem- 
ployed at the time of the census of unemployment in 1930. 

TABLE XXXVII 

Unemployed Male Workers in Boston by Age Groups, April, 

1930 1 


Age Number of Workers 


Total . 




13 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

50-54 

55-59 

60-64 

65-69 

u 

1*745 

u 

2 , 968 

u 

2,448 

u 

2,176 

u 

2,323 

u 

2 , 234 

u 

2,195 

u 

1,786 

a 

1 , 544 

u 

1 , 107 

u 

723 




1 Unemployment Bulletin , Massachusetts, United States Bureau 
of the Census, p. 12. 



210 


SOCIAL STATISTICS 


The general formula for the computation of the median from 
grouped data is: 


-F 

Md = / + — 2 — i 

Md = the median 

/ = value of the lower limit of the class-interval which 
contains the median 
i = total number of items plus one 
F = sum of all frequencies in classes below / 

/= number of items in the class-interval containing the 
median 

i = value of the class-interval 

Substituting in the formula to find the median for the data in 
Table XXXVII, we have: 


= 35 + 


21262 + 1 
9350 

2323 


5 


= 35 + 2.8 
= 37.8, the median 

The median is found to be 37.8 years — that is, correct to one deci- 
mal place. As reported by the census there were a few unemployed 
persons in the “unknown” class and a few in the class of “70 years 
and over.” These were omitted from the table from which the 
above median was computed. Those whose ages were known to 
be 70 or over could have been included in the table, but the un- 
known group could not be used, because there is no way of dis- 
tributing this group among the established class-intervals. 

The student will have noticed that the formula utilizes the class- 
intervals and frequencies below the median. As a matter of fact, 
the median may be computed by using the class-intervals and the 
frequencies above it in a similar manner. Changing certain letters 
in the formula for puiyoses of clarity, and changing the first plus 
sign to a minus, the formula for using the upper half of the data 
would be as follows: 


N+ 1 


-F x 


Md=L- 


/ 



STATISTICAL ANALYSIS 


21 1 


Fi = sum of all items in classes above L 
L = value of the upper limit of the class-interval containing 
the median 


Substituting in this formula: 


21262 + 1 


Md = 39.99 “ 


-9589 


2323 

1042.5 

= 39-99 ~ ~~~ 5 
2323 ^ 

= 37.75, the median 


Note that in the table the class-interval containing the median is 
indicated to be 34.39. That means that all ages of 34 and less 
than 40 are included in this class-interval. When computing the 
median from the top down, it is necessary to express the upper 
limit more exactly than the figure 39. Hence, it is indicated here 
to be 39.99, and the median by this method is 37.75, or five- 
hundredths less than the median by the previous computation, 
but this difference is due only to the fact that the upper limit of 
the class-interval was expressed to the hundredth of a year. It 
might have been expressed to the ten-thousandth of a year, in 
which case the difference between the first and second methods 
would have been five ten-thousandths. The important point is that, 
for all practical purposes, the medians are the same. It is more 
common, however, to find the median computed from the bottom 
upwards. 

When the data are grouped the median may also be located 
with approximate accuracy by graphic methods. The most common 
method for doing this is the use of “less than” and “more than” 
cumulative frequency curves drawn on the same paper. Table 
XXXVIII gives the cumulative frequencies and Figure XLIX 
locates the median graphically below the point of intersection. 
The exact value of the median cannot be determined from this 
graph, but apparently it is about the same as the computed median, 
that is, 37.8. The two cumulative frequency curves intersect at a 
point which divides the total frequencies in half and, consequently, 
at a point whose value is the middle value of the series. It should 
be noticed that the curves must be plotted, not at the mid-points 
of the spaces representing age periods, but on the ordinate repre- 
senting the lower limit of the class-interval. If they are plotted 
at the mid-points of the spaces, the intersection of the two curves 



212 


SOCIAL STATISTICS 
TABLE XXXVIII 

Cumulative Frequencies, Unemployed Male Workers in Boston 


(I) (II) 

Age in Years / “Less Than** “At or More Than’ 

Number Number 


10-14 13 0 21,262 

H-19 1,745 13 21,249 

20-24 2,968 1,758 19,504 

25-29 2,448 4,726 16,536 

30-34 2,176 7,174 14,088 

35-39 2,323 9,350 11,912 

40-44 2,234 11,673 9,589 

45-49 2,195 13,907 7,355 

50-54 1,786 16,102 5,160 

55-59 1,544 17,888 3,374 

60-64 1,107 19,432 1,830 

65-69 723 20,539 723 

21,262 o 


is to the right of the middle value j that is, the value indicated is 
too high. The chief value of the graphic method of locating # the 
median is for interpretation to persons reading a report or listen- 
ing to one which may be made orally. A chart can be presented 
which gives proper perspective and impresses the observer imme- 
diately as to the value of the median. If the numerical value is 
all that is wanted, it is easier to compute the median by formula. 

Quartiles, deciles, and percentiles are frequently discussed in 
connection with the median, because they are associated with it. 
But these measures are not, in fact, measures of central tendency, 
but of dispersion. Kelley recognizes this, though he includes them 
in his chapter on measures of central tendency . 2 Secrist uses the 
concept of these methods which is utilized here and discusses it in 
his chapter on dispersion . 8 There is no more reason for the dis- 
cussion of quartiles, deciles, and percentiles in juxtaposition to the 
discussion of the median than there is for the discussion of the 
standard deviation on the same page with discussion of the mean. 
These methods will be considered in the next chapter. 

As an average the median has at least two advantages not 
equally possessed by 'other averages: (1) it is easily calculated, 
and (2) it is not significantly affected by a few extreme items in a 
series. The median has been widely used in anthropometric and 

2 Kelley, Truman L., Statistical Method , p. 59. New York: Macmillan, 1923. 

8 Secrist, Horace, An Introduction to Statistical Method, Chap. X. New York: 
Macmillan, 1929. 



STATISTICAL ANALYSIS 


213 


Y=NUMBER OF UNEMPLOYED 

22,500 


20,000 -\f 


17,500 


15,000 


12,500 


10,000 



7,500 



~10 15 20 25 30 35 40 45 50 55 60 65 70 

. LESS THAN — — — AT OR MORE THAN 

———50 OF FREQUENCIES -- — MEDIAN 

Figure XLIX. — Location of the Median by Means of Cumulative 
Frequency Curves 



214 


SOCIAL STATISTICS 


educational measurements to locate the point above and below 
which 50 per cent of the items lie. In establishing norms of dis- 
tribution of traits the median value of the series is frequently 
important. But the median should not be used without careful 
consideration of other questions. If the frequency distribution with 
which the student is working is bi-modal, the median may fall at 
a point not representative of the series. It may be unrepresentative 
in a series with a single mode, but caution in its use is particularly 
important if the series shows two or more modal points. The 
median does not lend itself to numerical and algebraic treatment ; 
the algebraic sum of the deviations of the individual items from 
the median is not zero. Sometimes the computation of an average 
is only one step in a problem requiring statistical analysis, in which 
case it may be necessary to choose an average, such as the mean, 
which lends itself to algebraic uses. 

4. THE ARITHMETIC MEAN 

The arithmetic mean is a measure of central tendency derived 
from consideration of all the values in the series. It is affected by 
the size of every item included in the computation. When the 
word “mean” or “average” is used without qualification, the arith- 
metic mean is usually meant. Among statisticians it is the most fre- 
quently used average. Yule 4 points out that the arithmetic mean 
fulfills more of the conditions of an average than does any other 
measure of central tendency. He names six conditions: (1) an 
average should be rigidly defined; (2) it should be based upon all 
observations; (3) it should be readily comprehensible; (4) it 
should be easily and rapidly calculated; (5) it should be as little 
affected by fluctuations of sampling as possible; and (6) it should 
lend itself readily to algebraic treatment. The arithmetic mean 
fulfills all these conditions, except (5) which may not be fulfilled 
if there is a number of extremely small or extremely large items. 
The median would be less affected under such conditions. 

The mean may be computed from either ungrouped or grouped 
data, but the methods are somewhat different. Referring to Table 
XXXV, the mean age of 100 felons would be the sum of the ages 
divided by 100. The formula is: 


M = the mean 

X = the individual item in the series 
4 Op. cit pp. 108, 109, 1 19, 120. 



STATISTICAL ANALYSIS 


215 


2 = “the sum of” the individual items 
N = number of items 


Hence, 

100 

= 26.37, the mean 

This is the absolute mean and is the one commonly thought of 
when the mean is mentioned. If only a small number of items is 
involved, the sum of the individual items may be easily obtained, 
but, if the items happen to run into the thousands, the work would 
be considerable. Therefore, it is desirable to have a method of 
computing the mean from data grouped into class-intervals. 

To illustrate the method of computing the mean from grouped 
data we shall use the data for unemployed workers in Boston as 
given in Table XXXVII. The formula differs slightly from that 
for computing the mean for ungrouped data. It is as follows: 


N 

in which m is the mid-point of the class-interval and / the number 
of items within the class-interval. The other symbols have the 
same meaning as in the previous formula. In order to compute 
the mean from grouped data it is necessary to set up a table. The 
tabular form makes the process clearer and enables the student 
more easily to check the accuracy of his work. To compute the 
mean age of unemployed workers in Boston, the following table 
is given: 

TABLE XXXIX 


Computation of the Mean by the Long Method for Grouped Data: Unemployed 
Workers in Boston, Total 21,262 


Age 

Years 

(0 

Mid-Point of 
Class-Interval 
m 

(2) 

Number of 
Unemployed 

( 3 ) 

Product of Columns 
(2) and (3) 

fm 

(4) 

10-14 

12-5 

13 

l62.5 

15-19 

17-5 

1.745 

30 , 537.5 

20-24 

22.5 

2,968 

66 , 780 . 0 

2 J -29 

27.5 

2 , 448 

67,320.0 

30-34 

325 

2,176 

70,720.0 

35-39 

37-5 

2,323 

87,112.5 

40-44 

42.5 

2,234 

94,945.0 

45-49 

47-5 

2,195 

104,262.5 

jo -54 

52.5 

1,786 

93.7650 

55-59 

57-5 

L 544 

88,780.0 

60-64 

62.5 

1,107 

69,187.5 

65-69 

67.5 

723 

48,802.5 


Total 21,262 822,.' m. o 




2l6 


SOCIAL STATISTICS 


Substituting in the formula, we have: 


21,262 

= 38.68 years, the mean 

The use of this table and formula shorten the work for computing 
a mean for 21,262 items. It could be done by adding all the sepa- 
rate items and dividing by 21,262, but the work would be much 
greater. If the data are given in a frequency table, the mean could 
not be computed by adding the separate frequencies and dividing 
by the total number. It is, therefore, necessary to have a method 
of computing the mean from grouped data. But one caution should 
be kept in mind when computing a mean by this method. The mid- 
point of each class-interval was multiplied by the number of items 
in the class-interval. For example, in the first class-interval of 
the table the mid-point is 12.5, and this number is multiplied by 
13. It is assumed that some of the ages are less than 12.5 years 
and that some are greater but that the sum of the differences of 
those less than 12.5 is equal to the sum of the differences of those 
above 12.5. We have assumed an even distribution of the items 
throughout the class-interval. This assumption is made regarding 
each class-interval in the table. If for any reason it were likely 
that the items in each class-interval tended to concentrate at either 
the lower or the upper end of the class-interval, the computed 
mean would probably be erroneous, because the assumption of 
even distribution would be unjustified. It was pointed out in 
Chapter VI 5 that uneven distributions do occur. The student 
should consider carefully whether or not his frequency distribu- 
tion is of this sort. If it is suspected that the data in a frequency 
distribution may have this tendency and there is no way of re- 
arranging the class-intervals because of the absence of the original 
data, then some reservation is in order as to whether the com- 
puted mean is exact or not. 6 That reservation may properly be 
made regarding the mean age of unemployed workers above, be- 
cause ages, even in the United States census, have been known to 
show some concentration around ages divisible by 5. No redistri- 
bution can be made of the class-intervals which would test this fact, 

5 See p. 148. 

‘Sheppard has suggested a correction for the standard deviation of a distri- 
bution characterized by unevenness within the class-interval. In such cases the 

ji 

true standard deviation is: <r 9 =cn a — — . See Yule, op. cit., pp. 211, 212. 

12 



STATISTICAL ANALYSIS 


217 


because the original data are not published by the census j it would 
not be possible to put the ages divisible by 5 at the mid-point of 
class-intervals. Hence, it may be that 38.68 years is not the exact 
mean but only an approximation to it. 

Although the preceding method of computing the mean saves 
time as compared with the method of adding the separate items 
and dividing by the number of items, it is still a “long method” 
for computing the mean. It involves the use of large numbers and 
long multiplications, and when large numbers are used the chance 
of mistakes is increased. The statistician should employ all possi- 
ble methods to eliminate opportunity for errors. There is a shorter 
method, sometimes called the “deviation method,” of computing 
the mean. 

The algebraic sum of the deviations from the mean is zero. This 
fact may be used to compute the mean by a “short method.” The 
student may take any small number of items for which he has 
computed a mean, express the deviations of each item from the 
mean with their appropriate signs, and add the deviations alge- 
braically. The result will be zero. 

Since we know that the sum of the deviations from the mean 
is equal to zero, we can take any arbitrary origin in the frequency 
distribution, assume the mid-point represented by this origin to be 
the mean, compute a correction factor, add the correction factor 
to the assumed mean, and the result is the true mean of the fre- 
quency distribution. This principle will be illustrated by the same 
data as were used to illustrate the computation of the mean by the 
long method. 7 

The assumed mean in Table XL is 37.5 years. That becomes the 
arbitrary origin from which to measure step-deviations, that is, 
deviations from the mid-point of the class-interval containing 
the assumed mean expressed in class-interval units. In column 
(4) the arbitrary origin is marked 0 and the step-deviations of 
class-intervals whose values are lower than that of the class in 
which the mean is assumed to be are marked minus. The step- 
deviations of class-intervals higher in value than the origin have 
a plus sign, but, following conventional procedure, the sign is not 
expressed. 8 The frequencies, /, are multiplied by their respective 
step-deviations, d> and the product of any / and any d takes the 

7 For more detailed proof of the short method for computing the mean, see 
Mills, Frederick C., Statistical Methods . New York: Henry Holt k Co., 1924. 

"In this book, wherever the algebraic sign in front of a quantity is unex- 
pressed it is plus. 



2 I 8 


SOCIAL STATISTICS 
TABLE XL 

Computation of the Mean by the Short Method 


Age 

(years) 

(1) 

Mid-Point of 
Class-Interval 

m 

(2) 

Fre- 

quency 

Deviations from 
Assumed Mean in 
Class-Interval 
Units 
d 
(4) 

Jd 


( 7 ) 

-f 

( 6 ) 

10-14 

12.5 

13 

-5 

, 6 J 


15-19 

17.5 

1 *745 

-4 

6,980 


20-24 

22.5 

2,968 

-3 

8,904 


25-29 

27.5 

2,448 

—2 

4,896 


30-34 

32.5 

2,176 

— 1 

2,176 


35-39 

37-5 

2,323 

0 



40-44 

42.5 

2,234 

1 


2.234 

45-49 

47-5 

2,195 

2 


4.390 

5°-54 

52.5 

1,786 

3 


5.358 

55-59 

57-5 

1.544 

4 


6,176 

60-64 

62.5 

1 , 107 

5 


5.535 

65-69 

67.5 

723 

6 


4.338 


21,262 23,021 28,031 


sign of the d. For convenience two columns are provided in the 
table, one for the — fd’s and the other for the -\-fd*s. The totals of 
columns (3), (5), and (6) are obtained. Now we are ready to 
substitute in the formula for the short method, which is: 

M = + c 

in which M is the mean, Mi is the assumed mean, or 37.5 in this 
case, and c is the correction factor. 

= in steps or class-intervals 
N 

c ~ ~aT m years 


= 38.68 years 

In order to reduce the correction factor to terms of years, it must 
be multiplied by the si^e of the class-interval, 5. This is indicated 
above by the symbol, /. It should be noted that the correction 
factor is added in the algebraic sense ; that is, with due regard to 
signs. If the assumed mean is higher than the true mean, the 
correction factor will be a minus quantity. The mean computed 
by this short method is exactly the same as by the long method. 



STATISTICAL ANALYSIS 


219 

In order to test whether or not the results are the same, when 
different arbitrary origins are used, we might try several others. 
The author has computed the mean from 47.5 as assumed mean, 
and the result is 38.68 years, the same as the preceding result. 
The result will be the same, regardless of what the assumed mean 
is, but, if the assumed mean is taken near the true mean, the 
figures dealt with are smaller and, consequently, more readily 
handled. 

The same caution as to the concentration of values at some 
point in the class-interval holds for the short method as for the 
long method. The class-interval should not be too large, and, if 
it is known that concentration of the values of items occurs at a 
certain pJace, this point should when possible be placed in the 
middle of the class-interval. 

There is another variation of the mean, but it is still an arith- 
metic mean. That is the weighted mean . The method of com- 
puting the weighted mean resembles the method of computing 
an ordinary mean from a frequency distribution, but in practice 
the concept is more restricted and should not be applied to a 
frequency distribution. The concept should be used in connection 
with a mean computed from rates or ratios, and it is widely used 
in the construction of index numbers. The formula is: 


Computation of a weighted mean will be illustrated from a series 
of index numbers for various types of public welfare work in 
Indiana. These indexes are based upon the number of clients per 
100,000 population of the state who were under the care of the 
agencies at the end of the fiscal year, 1930. 

12,983.30 

100.0 

129.83 

The weighted mean for these index numbers is 129.83. The per- 
centages in column (2) represent changes for the same institutions 
and agencies from the numbers of people they were serving in 
1913 (in each case the number served in 1913 is taken as 100 per 
cent). Consequently, the weighted mean shows that the same 



220 SOCIAL STATISTICS 

TABLE XLI 


Computation of the Weighted Mean Index Number for the Number of Clients 

UNDER THE CARE OF PUBLIC WELFARE AGENCIES IN INDIANA, SEPTEMBER 30, I93O. 

Base, 1913 


Agency 

(0 

Index 

Number 

X 

<*) 

Weight — Per Cent 
of Total Clients 

W 

( 3 ) 

Col. (2) X Col. (3) 

WX 

(4) 

Total 

1 . 925-4 

100.0 

12,983.30 

Hospitals for Insane 

97 9 

23.8 

2,330.02 

School for Feeble-minded Youth. 

107.7 

5-4 

581-58 

Colony for Epileptics 

Soldiers* Home 

307.6 

2-7 

830.52 

3 2 -5 

1.2 

39.00 

Soldiers* and Sailors* Orphans 
Home 

117*3 

2.2 

258.06 

Tuberculosis Sanatorium 

124.4 

.6 

74.64 

School for the Deaf 

116.8 

1.4 

163.52 

School for the Blind 

86.9 

• 5 

43-45 

State Prison 

168.7 

8.1 

1,366.47 

Reformatory 

Women’s Prison 

177.9 

69 

1,227.51 

1 3 1 -4 

■ 7 

91 .98 

Boys’ School 

Girls’ School 

80.0 

1-7 

136.00 

113.6 

1-3 

147.68 

Poor Asylums 

133.5 

16.9 

2,256.15 

Dependent and Neglected Chil- 
dren, Wards 

129.2 

26.6 

3 . 436.72 


agencies were caring for 29.83 per cent more persons in 1930 in 
proportion to population of the state than in 1913. The simple 
mean of the index numbers is 128.40, or more than 1 per cent 
less than is shown by the weighted mean. This is not a large 
difference, but in some cases the weighted mean may vary much 
more from the simple mean. The weighted mean is used exten- 
sively in finding the average price of a commodity on a certain 
day. For example, the price of eggs of the same grade will vary 
in price among a number of stores, and variations among different 
grades will be still larger. The only way to arrive at a figure which 
fairly expresses the general price of eggs in a city on a given day 
is to weight the price of different grades and of prices for like 
grades at different stores by the quantities sold on that day. In 
constructing an index number of the cost of living, the United 
States Bureau of Labor Statistics made extended studies of the 
quantities of different articles used in the family budgets of a large 
sample of families. Weights were determined on the basis of the 
quantities used, and then the average prices of the commodities 
were multiplied by the weights to give proper importance to each 



STATISTICAL ANALYSIS 


221 


item. An index number which would fairly represent the cost of 
living could not be computed without using a system of weights, 
based upon the relative importance of the items included. 

The quantity used for a weight is to a considerable extent arbi- 
trary. Whatever it is, it represents the worker’s estimate of the 
relative importance of the items which are to enter into the 
weighted mean. Pounds, inches, dollars, ratios, etc., may constitute 
the weights. Because of the arbitrary element in weighting, it is 
sometimes said that better results would be obtained by neglecting 
weights. But this is obviously fallacious, because differential im- 
portance is given to the items in a series of rates or ratios regard- 
less of the presence or absence of a weighting plan. For example, 
the index number for patients in the Indiana Colony for Epileptics 
was 307.6 in 1930, whereas the index number for patients in the 
hospitals for the insane was 97.9. The number of patients in the 
Colony for Epileptics was 767 at the end of the fiscal year, 1930, 
while the number of patients in the hospitals for the insane at the 
same time was 6,839. The rate of increase for the Colony for 
Epileptics, compared with 1913, was very large, whereas the hos- 
pitals for the insane show a slight decline. Whether or not a 
system of weights is used, there is weighting — that is, the rate of 
change in the population of the Colony for Epileptics is given 
an importance which in fact it does not have. If instead of finding 
the mean index number for the 15 public welfare agencies and 
institutions, it was desired to find only the mean index number 
for the hospitals for the insane and the Colony for Epileptics, the 
importance of weighting is made still clearer. The simple mean 
of 97.9 and 307.6 is 202.8, which implies a doubling of the 
number of patients in the seven institutions represented since 1913. 
But the population of the hospitals for the insane has actually 
declined slightly relative to population in Indiana j it is only the 
population of the Colony for Epileptics that has shown a rapid 
increase, and the number of patients in the Colony for Epileptics 
in 1913 was small. If new percentage weights are computed for 
epileptics and insane only, and the index numbers are multiplied 
by these, the weighted mean index of population of these two 
types of public welfare institutions is 119.1, as compared with an 
unweighted mean of 202.8. Whether in computing means of rates 
and ratios we consciously use weights, or whether we do not, the 
resulting mean is weighted. The problem then becomes one of 



222 


SOCIAL STATISTICS 


devising a rational system of weights instead of leaving the result 
to chance weighting. 9 

The advantages and limitations of the arithmetic mean may now 
be summarized. It is (i) the most widely used of all averages 5 
(2) it has a definite value; (3) it lends itself to algebraic treat- 
ment; (4) it is easily computed from either ungrouped or grouped 
data; (5) unless some other form of an average is specifically 
indicated or only a rough approximation to the central tendency 
is required, the mean is the best average to use. One caution should 
be borne in mind: the mean is sensitive to extreme values in the 
series and may not be truly representative, in which case some 
other measure of central tendency should be used along with it. 

5. THE GEOMETRIC MEAN 

For a series of items the geometric mean is the nth root of the 
product of the items. If the geometric mean is wanted for 10 items, 
the items are multiplied together and the 10th root taken. In 
terms of the formula, it may be expressed thus: 

M t = V (*1) (*2) (*3) (*») 

To take a simple example: 

M 0 = V( 3 ) (6) (9) 

= y/ 162 
- 5-45 

If there are many items and large numbers are involved, the 
difficulty in extracting the nth root becomes very great. In such 
cases logarithms may be used. The arithmetic mean of the sum of 
the logarithms of the items is the logarithm of the geometric 
mean of the items; the logarithms may be found by consulting a 
logarithmic table. The geometric mean may be computed for 
either ungrouped or grouped data. The formula differs slightly 
from that given above and is as follows: 

In order to compare it with the arithmetic mean, the data from 
Table XXXIX will be used: 

•For further discussion of weighting see Chaddock, op. cit., pp. 193-196; 
Secrist, Horace, op. cit., pp. 241-246; Yule, op. cit., pp. 220-225. 



STATISTICAL ANALYSIS 223 

TABLE XLII 

The Geometric Mean Age of Unemployed Workers in Boston Computed with 
the Use of Logarithms 


Age in Years Mid-Point Number 

m f log m f log m 

(1) w 6 ) (4) (5) 


10-14 12.5 13 1.096910 14.259830 

15-19 175 1,745 1.243038 2169.101310 

20-24 22.5 2,968 1.352183 4013.279144 

25-29 27.5 2,448 1.439333 3523.487184 

30-34 32.5 2,176 1.511883 3289.857408 

35-39 375 2,323 1. 574031 3656.474013 

40-44 42.5 2,234 1.628389 3637.821026 

45-49 47-5 2,195 1.676694 3680.343330 

50-54 52.5 1,786 1.720159 3072.203974 

55-59 57-5 1,544 1.759668 2716.927392 

60-64 62.5 1,107 1.795880 1988.039160 

65-69 67.5 723 1.829304 1322.586792 


21,262 33084380563 


Log M, . 33°84.38 ° 5<i3 
21262 
= 1.556029 
= 35-97 years 

The geometric mean age is smaller by 2.7 years than the arith- 
metic mean. It is characteristic of the geometric mean that it gives 
less weight to extreme deviations than does the arithmetic mean, 
which results in a somewhat lower average. In the above problem 
it will be noticed that the logarithm of the mid-point of each class- 
interval is taken. Then the logarithm of this number is multiplied 
by the frequency of the class-interval. The sum of these products 
divided by the total frequencies gives the logarithm of the geo- 
metric mean. 

Some social series show an aggregate increase over a period of 
time. Such are population, per capita income in the United States, 
and publicly supported social welfare activities. If it is assumed 
that the rate of change is the same in each year of a period under 
consideration and this rate of change is unknown but is to be 
determined, then the geometric method is the one to apply. The 
formula is as follows: 


- i - U\- • - / 

Pi = population at the end of n years 
Po = population at beginning of period 
r = rate of change per year 



224 


SOCIAL STATISTICS 


n = number of years, used as the power to which the expres- 
sion in the parentheses is to be raised 

Or, using logarithms, the formula may be. written: 

log Pi = log P 0 + n log(i + r), amount of change 

log(i + r) = ^ — — ^° , rate of change 

n 

Where a power larger than a cube is used, the student will find 
the use of logarithms indispensable. Suppose we want to know the 
annual rate of growth of the population of the United States from 
1920 to 1930. The substitutions would be as follows: 

lo K (j + r ) = log (122,775,046) - log (105,710,620) 
g 10.25 

_ 8.089092 — 8.024116 
10.25 

= .006339, logarithm of the rate 
(1 + r) = 1.0147 

r = 1.0147 - 1 

= .0147, or 1.47 per cent increase per year 

The importance of the geometric mean in estimating this type of 
change is emphasized by the fact that the arithmetic mean would 
be 1.57 per cent for the period of 10.25 years. The geometric 
mean allows for the changing volume of population each year, 
while the arithmetic mean uses the population of 1920 as 100 per 
cent for each succeeding year. 

The investigator should be cautioned regarding this use of the 
geometric mean, however. As a matter of fact, population does not 
change at the same rate each year over a long period of time. Its 
growth is affected by immigration laws, by the spread of birth 
control, by the business cycle, and by wars. Other social series 
which show an upward trend over a long period of time also may 
have irregular rates of change. Consequently, the choice of the 
geometric mean as a single method of estimating the rate of 
change depends upon the judgment of the worker as to whether 
or not it really is the best method. Yule points out that, even if 
the geometric mean rate of change in population be a close ap- 
proximation to the facts for a whole country, it cannot be assumed 
to represent the rate of change in smaller geographic subdivisions; 
these have special conditions which affect their rates of change. 



STATISTICAL ANALYSIS 


225 


The worker must constantly exercise his judgment to avoid un- 
warranted assumptions . 10 

The geometric mean, then, has some uses, in which it is superior 
to other averages. It can be used for estimating change in an aug- 
menting social series, and it is useful in averaging ratios such as 
index numbers. It has the disadvantage, however, of being un- 
familiar to many users of statistics and for that reason should be 
used with caution and with full explanation of its significance. 

6. RELATIONS EXISTING AMONG AVERAGES 

The quantitative relations existing among the four averages 
discussed above are not constant, but in some types of frequency 
distributions they approximate to constant conditions. The rela- 
tions of the mean, median, and mode are determined by the degree 
of asymmetry of the frequency distribution. 

It has been noticed by Pearson and others that in certain mod- 
erately asymmetrical distributions the median is located at a point 
between the mean and the mode about one-third the distance from 
the mean in the direction of the mode, and the rule has been laid 
down that this may be taken as a fairly constant relation among 
the three measures of central tendency. In view of this fact, 
Pearson has proposed the following formula for determining a 
rough measure of the mode in moderately asymmetrical 
distributions: 


Mo = M-z(M-Md) 

Obviously the mode computed by this formula will be twice as 
far from the median as the median is from the mean. The ques- 
tion to be raised regarding any frequency distribution is whether 
or not it is “moderately asymmetrical.” Preceded by the word 
“moderately,” this concept becomes qualitative and not quantita- 
tive. If defined in quantitative terms, it should mean any dis- 
tribution having a mode, determined by more exact methods than 
the method under discussion, twice as far from the median as the 
median is from the mean. The distribution presented in Figure L 
appears to the eye to be moderately asymmetrical but, when de- 
fined in terms of the above mean-median-mode relation, it is clear 
that it is not moderately asymmetrical because the median and 
the mode are very close together, while both are considerably 
higher than the mean. The important point concerning the relative 


10 Yule, op. cit. t p. 126. 



226 







STATISTICAL ANALYSIS 227 

positions of the mean, median, and mode is that the median and 
the mode always move in the direction of the skew of the fre- 
quency distribution. Consequently, they may be used along with 
the mean and the standard deviation from the mean as measures 
of skewness. 

To illustrate the relative positions of the mean, median, and 
mode in a moderately asymmetrical distribution the data, from 
Table XIX are presented in graphic form above with the three 
averages indicated. 

For the data presented in Figure L the mean is 14.05 years, the 
median 14.33 years, and the mode 14.42' years. The three meas- 
ures of central tendency are very close together, because this 
frequency distribution is only moderately asymmetrical. 

There is no constant relation between the mean and the geo- 
metric mean, except that the geometric mean is always somewhat 
smaller. This is due to the fact that in squaring the quantities to 
obtain the geometric mean the extreme values are minimized. The 
degree of difference between the mean and the geometric mean 
will vary directly with the ratio of the standard deviation to the 
mean. 11 


7. EXERCISES 

1. Data for computation of averages: 

TABLE XLIII 

Distribution by Ages of Parolees, Classified by Total and 
by Success 1 


Age 

in Years 

(1) 

Parolees, 

All Classes 

(2) 

Parolees 
Successful at 
Each Age 
(3) 

Total .... 


.... I , OO4 

557 

7 


.... I 

1 

8 


.... I 

1 

9 


4 

1 

10 


8 

4 

11 


16 

7 

12 


28 

15 

13 


41 

21 

H 


87 

43 

15 


hi 

65 

16 


. . . . 190 

100 

17 


. . . . 190 

102 

18 


126 

60 


n Yule, op. cit., pp. 123, 156. 



228 


SOCIAL STATISTICS 
TABLE XLIII — {Continued) 


Age in Years 


Parolees, 
All Classes 


Parolees 
Successful at 
Each Age 
(3) 


19 

20 

21 

22 

23 

24 

2 1 

26 

27 

28 

29 

30 


96 

44 

23 

15 

7 

5 

6 
1 

1 

2 

0 

1 


56 

3i 

H 

11 

5 

4 

5 
1 
1 
o 
o 
o 


1 Missouri Crime Survey , p. 469. New York: Macmillan, 1926. 

Table XLIII is derived from Table XVI of this report. 

(a) Find the moda age for all parolees and for successful 
parolees by a gi phic method and by a formula. Compare 
the modal age the two groups. 

(b) Find the media age for each group in the table using the 
cumulative freq ency curve method and a formula. Com- 
pare the medial ages found. 

(c) Find the mean age of parolees in each column of the 
table, using both the long and the short method. Compare 
the two means. Why are they the same or approximately 
the same size? Could you call this a weighted mean? 

(d) Compare the mode, median, and mean found for each 
column of data. 

(e) Do these ages of parolees illustrate a symmetrical, mod- 
erately asymmetrical or highly asymmetrical distribution? 
How would you determine this, using only statistical meth- 
ods thus far described? 

Data for computation of averages: 

(a) Compute the mode, median, mean, and geometric mean 
wage for the 423 wage earners in this table. 

(b) By adding pairs of class-intervals, increase the class- 
interval to $200. Compute the mode, median, mean, and 
geometric mean wage from the results and compare the 
averages with (a). 

(c) Would you call this a moderately asymmetrical frequency 
distribution? 



STATISTICAL ANALYSIS 


TABLE XLIV 

Earnings of Chief Wage Earners in Families 1 



Earnings of Wage 

Earner 

Number of 
Wage Earners 

T 


ah . 


$ 800- 899 . 
900- 999 . 

1 ,000-1,099. 
i , 100-1 , 199. 

1 , 200-1 , 299 . 
1,300-1,399. 

1 ,400-1,499. 
1,500-1,599. 

1 , 600-1 , 699 . 

1 ,700-1,799. 

1 , 800-1 , 899 . 

I GOO— I . GGQ 


6 

ii 

40 

50 

63 

63 

81 

45 

H 

6 

7 

2,000-2,099. 

2 , 100—2 , I 99 . 


2 

4 

2 , 200—2 , 299 . , 


0 

0 inn — 1 _ ion 


1 



1 Hough teling, Leila, The Income and Standard of Living of Un- 
skilled Laborers in Chicago , p. 27. University of Chicago Press, 1927. 


8. REFERENCES 

Chaddock, Robert E., Principles and Methods of Statistics , Chaps. 

VI, VII. 

Kelley, Truman L., Statistical Method , Chap. III. 

Mills, Frederick C., Statistical Methods , Chap. IV. 

Secrist, Horace, An Introduction to Statistical Methods , Chap. IX. 
Yule, G. U., An Introduction to the Theory of Statistics , Chap. 

VII. 



CHAPTER IX 


Measures of Dispersion 


I. INTRODUCTION 

In the preceding chapter we have been concerned with the tend- 
ency of values in social data to cluster around a central value. 
Measures of this tendency are useful in arriving at a shorthand 
description of the data. But the tendency of data to scatter below 
and above the central value is as noticeable as is concentration. 
An adequate description of a frequency distribution requires 
knowledge of both scatter and concentration. Scatter is usually 
referred to in statistics as dispersion or variation. Deviations from 
the central value may be due to chance $ that is, the whole universe 
of a particular type of data, if it could be taken into consideration, 
would show dispersion about the average. Deviations from an 
average may be due to the method by which the sample was se- 
lected from the universe of similar data. The sample may not 
fairly represent the universe from which it was selected, in which 
case the amount of dispersion may be either less or more than it 
would be for the universe. Thus, deviations from the central value 
are due both to chance and to the method of sampling. 

Measures of dispersion are practical checks on the homogeneity 
of the data. The smaller the amount of dispersion around the 
average, the greater the homogeneity of the data for the trait 
measured. Conclusions drawn from the study of relatively homo- 
geneous data are more reliable than those drawn from the study 
of data which are highly heterogeneous. The amount of dispersion 
for a given sample*may be less than the amount found in an abso- 
lutely random sample from the universe of the same kind of social 
phenomena. The measure of dispersion shows this fact, but at the 
same time it indicates a high degree of homogeneity, and conclu- 
sions drawn for this sample but not extended to any other data 
of the same universe will be correspondingly reliable. This may 

230 



STATISTICAL ANALYSIS 


231 

be illustrated in various ways. For a number of years effort has 
been made by psychologists to find empirically a normal distribu- 
tion of intelligence among a sample of children. Terman found 
that a sample of 905 intelligence quotients, which he and his asso- 
ciates obtained, was distributed approximately as a bell-shaped 
curve. The measure of the dispersion of this random sample 
might then be taken as a close approximation to the measure of 
dispersion of intelligence quotients, if all children in the United 
States were examined. Certain school policies might be based upon 
the dispersion of intelligence in this random sample, but the 
amount of dispersion would be greater than it would be for a sam- 
ple of children attending a school which selects only children with 
I.Q’s, say, at or above no; and it would be higher than the 
amount of dispersion found among children in a school for the 
feeble-minded. Conclusions based upon the amount of dispersion 
of the I.Q’s would be more reliable in the formulation of specific 
policies for these schools than would conclusions affecting the 
policies of a school whose children had a normal distribution of 
I.Q’s. The homogeneity of intelligence among the children of the 
two schools would be high. The dispersion found in the age dis- 
tribution of workers in an industry is a measure indicating the 
policy of the industry to restrict employment to certain age groups 
or to disregard age. Compared with dispersion of ages in the 
general population, the dispersion in a particular industry might 
be small j this would suggest, as a matter for further study, that 
possibly there is discrimination against workers over a certain age 
limit. 

A measure of dispersion may assist public health officials to 
judge the effectiveness of their work. A city which has the census 
tract system and uses the census tracts as public health units will 
serve as an illustration of this use of measures of dispersion. An 
average death rate for all tracts may be computed and the disper- 
sion found. Those tracts which deviate widely from the average 
rate have either exceptionally good or exceptionally bad health 
conditions. Those in which mortality is high, assuming a standard 
population has been used for computing rates, must have some 
health disadvantages. The location of these tracts by means of 
their dispersion from the average rate enables the health officials 
to concentrate efforts at those points which need improvement 
most. Thus measures of dispersion become aids to social control. 



232 


SOCIAL STATISTICS 


One other use of measures of dispersion may be mentioned. In 
all measures of the degree of interrelationship between sets of 
social phenomena some measure of dispersion has to be used, be- 
cause relationship is expressed as a function of average variability, 
involving both direction and amount of variability. This use of 
measures of dispersion will become apparent when we take up the 
subject of correlation. 

2. THE QUARTILE DEVIATION 

The first and third quartiles of a frequency distribution indicate 
dispersion from the median as the average. The first quartile is 
the value of the item below which 25 per cent of the values fall, 
and the third quartile is the value of the item above which 25 per 
cent of the items fall. That is, between the first and third quartiles 
half the items in the frequency distribution are found. Like the 
median, the quartiles are position values. In order to determine 
their values the data must be arranged in class-intervals from 
lowest to highest values. The quartiles are not averages; they do 
not represent central tendency. They represent deviations from 
central tendency. For that reason they properly belong under the 
discussion of measures of dispersion. The quartile deviation is the 
sum of the first and third quartiles divided by 2. The values be- 
tween the first and third quartiles are sometimes referred to as the 
interquartile range, and the quartile deviation as the semi-inter- 
quartile range. 

Before the quartile deviation can be determined, the values of 
the first and third quartiles must be computed. They may be 
found by formulas similar to that used for locating the median 
(P- 234) : 


= first quartile 

/ = lower limit of the class-interval in which the first quartile 
falls 

N = total number of items in the frequency distribution 
F = sum of all frequencies in classes below / 
i = value of the class-interval 

/= number of items in the class-interval containing the first 
quartile 



STATISTICAL ANALYSIS 


233 

Using the data for unemployed men in Boston and referring to 
Table XXXVII, the substitutions would be as follows: 


2448 

= 26.2 years 

Qi is 26.2 years. That is, 25 per cent of the unemployed workers 
in Boston were 26.2 years of age or less. -The formula for deter- 
mining the third quartile may be written as follows: 

' - F 

V \ 

In this formula the meaning of the symbols is not changed ex- 
cept that l refers to the lower limit of the class-interval in which 
the third quartile falls. The other symbols may be read as in the 
preceding formula. The only other change is in the multiplication 
of n by 3 in order to obtain three-fourths of the total frequencies, 
reading upward from the lowest toward the highest. Substituting 
in this formula, we get: 


= 45 + 


- 13907 


2195 




— 49.6 years 

Q3 is 49.6 years. Seventy-five per cent of the unemployed workers 
in Boston were 49.6 years of age or less. 

The formula for the quartile deviation is: 


Substituting the values of the first and third quartiles found for 
the unemployment data in this formula, we get: 


2 = 


49.6 — 26.2 


= 11.7 years, the quartile deviation 
If the data are ungrouped and are arranged in an array, the 
first and third quartiles are easily determined by simply counting 



234 


SOCIAL STATISTICS 


off from the lowest value 25 and 75 per cent of the items, respec- 
tively. The formula for the quartile deviation may then be used. 

The advantages of the quartile deviation as a measure of dis- 
persion are that it is a definite quantity, easily computed, and 
simple to understand. But it is a position measure of dispersion 
and does not lend itself to algebraic uses. Another limitation of 
the quartile deviation is the fact that it is not affected by the 
variability of the items whose values lie either between the first 
and third quartiles or outside of them. The quartile deviation is 
simply the mean deviation of the values of the first and third 
quartiles. If for special reasons the median is preferred as the 
average to be used, then the logical measure of dispersion to use 
with it is the quartile deviation. Otherwise, it is probably better 
to employ some other measure of dispersion. 

3. PERCENTILES AND DECILES 

Another measure of dispersion which resembles the quartile 
deviation in being a position value is the percentile. A percentile 
is a rank on a scale divided into 100 equal parts, and the value of 
any particular percentile is equal to the sum of the hundredths 
below and including the particular rank. It is a percentage concept. 
Deciles are simply the 10th, 20th, 30th, etc., percentiles. If a posi- 
tion measure of dispersion is to be used, percentiles or deciles are 
in some respects preferable to the quartiles, because they give a 
more detailed description of dispersion. For certain technical pur- 
poses the percentile measure of dispersion has been found very 
useful. Perhaps it has been used most by psychologists and educa- 
tional administrators for ranking school children according to 
intelligence or school ability. Some psychologists prefer to rank 
the children tested on a percentile scale rather than to assign I.Q’s. 
The percentile method may also be used for such purposes as 
ranking rates of piece workers in a factory, death rates by counties, 
birth rates by counties, crime rates by census tracts, etc. There is no 
statistical reason why the percentile method could not be applied 
to any type of data, but in practice its use has been confined largely 
to educational and psychological data. Yule suggests that it may 
also be used to show the distribution of non-measurable traits. 1 

The computation of percentiles will be illustrated from the 
following table which gives the infant mortality rates in 1929 for 
108 cities of the United States: 

l YuIe, op. cit., p. 150. 



STATISTICAL ANALYSIS 


235 


Percentile Distribution 

TABLE XLV 

of Infant Mortality Rates 
United States, 1929 1 

in 108 Cities in the 




Percentiles at Mid-Point 

Infant Death Rate 

i 

j 

Cumulated 


of Class-Interval 

(0 

(2) 

( 3 ) 


(4) 

3 °- 34-9 

1 

1 


0.47 

35 “ 39-9 

2 

3 


1.86 

40- 44-9 


4 


3-27 

45 “ 49-9 

9 

13 


8.41 

50 - 54-9 

4 

17 


13-95 

55 - 59-9 

17 

34 " 


23-77 

60- 64.9 

11 

45 


36.79 

f’ 5 - 69 -9 

20 



51.15 

70 - 74-9 

18 

83 


68.82 

75 " 79-9 

4 

87 


79.05 

80- 84.9 

4 

91 


82.77 

85- 89.9 

I 

92 


85 . 10 

90 - 94-9 

2 

94 


86.49 

95 - 99-9 

3 

97 


88.82 

100-104.9 

0 

97 


90.21 

105-109.9 

1 

98 


90.78 

110-114.9 

2 

100 


92.12 

115-H9-9 

2 

102 


93-93 

120-124.9 

2 

104 


95-79 

1 25-1 29. 9 

I 

105 


97 1 9 

Ido-134 -9 

I 

106 


98.17 

I35-I39-9 

n 

108 


99.51 


1 Weekly Health Index , United States Bureau of the Census, Vol. II, No. 35. 

The formula for computing any percentile is: 

‘ ~ ' ‘ / ' 

P — the value of the percentile to be found 
/ = value of lower limit of class in which percentile occurs 
p = Per cent of cases having values equal to or less than P 
N = number of items in entire frequency distribution 
F — total frequencies below particular percentile class-interval 
/ = frequencies in particular percentile class-interval 
i = value of the class-interval 

The similarity between this formula and the formula for the 
median is apparent. In each case the aim is to determine the value 
of an item at a certain position in a frequency distribution. In 
column (4) of Table XLV the percentiles which fall approxi- 
mately at the middle of the class-intervals are given $ the infant 
death rate is, of course, the mid-point of the class-interval opposite 
the percentile concerned. But how could the value of the 40th 
percentile be determined? The first thing to determine is the 



SOCIAL STATISTICS 


236 

class-interval in which the 40th percentile falls. Forty per cent of 
the 108 rates will be below the 40th percentile, and 40 per cent 
of 108 is 43.2. Hence, the 40th percentile falls in the class-interval 
60-64.9 because there are 45 frequencies below 64.9. Now, we 
may substitute in the formula: 

P = 6o+ ( * ao) -^^ 5 

II 


I I 

= 64.2, the 40th percentile death rate 

Any other percentile may be found in like manner. 

If a percentile curve is constructed for a set of data, any per- 
centile may be located graphically with a fair degree of accuracy. 
The form of the percentile curve is an ogive, such as that below 
in Figure LI. The percentiles for the mid-points of the class- 
intervals in Table XLV were used to plot this curve. 

The broken horizontal and vertical lines on the graph were drawn 
to locate the value of the 40th percentile. A line was drawn from 
the zero ordinate along the 40th abscissa until it intersected the 
curve. At this point a perpendicular line was dropped to the base. 
This perpendicular intersects the base line slightly above the 64th 
ordinate. That is, the value of the 40th percentile is a little more 
than 64 — by formula it was found to be 64.2. 

Sometimes the only percentiles wanted are the deciles, or every 
tenth percentile. A decile is determined in the same manner as 
any other percentile. 

The principal value of percentiles and deciles as measures of 
dispersion lies in their simplicity. We are accustomed to think in 
terms of percentage and tenths. Consequently, when it is said that 
40 per cent of the cities reporting infant mortality to the Bureau 
of the Census have rates of 64.2 or less, little explanation is re- 
quired. That is essentially what the 40th percentile means. If the 
90th percentile has a value of 100, we know that 10 per cent of 
the cities had infant mortality rates greater than 100, which is 
very high. To give the values of the deciles or the values of sev- 
eral percentiles at other points is to suggest the degree of disper- 
sion below and above the median percentile. In the case of 
intelligence ratings, the use of percentiles instead of I.Q’s may re- 
flect a healthy skepticism of intelligence tests and convey the 
meaning that the examiner is discussing only the distribution of 
intelligence in the group examined and that he is distributing 







238 SOCIAL STATISTICS 

ratings of what the tests test, whether it be intelligence or some- 
thing else. 


4. THE AVERAGE, OR MEAN, DEVIATION 

The average deviation is the mean of the deviations from an 
average, disregarding algebraic signs. It may be computed from 
the mean, median, or mode, but generally the mean or the median 
is used. The sum of the deviations from the median is slightly less 
than the sum of the deviations from the mean. Hence, the average 
deviation is somewhat smaller when computed from the median 
than when computed from the mean. For this reason many statis- 
ticians think it best to use the median from which to compute the 
average deviation, unless practical considerations make the mean 
the preferable average . 2 Both methods will be illustrated. 

The average deviation may be computed from either ungrouped 
or grouped data. Computation from ungrouped data will be illus- 
trated from the amount of relief per case given by twenty family 
relief agencies in cities reporting to the Russell Sage Foundation: 

TABLE XLVI 


CoMF'UTAtTON OF THE AVERAGE DEVIATION FROM Un- 

grouped Data: Amounts of Relief per Relief Case 
in 20 Family Relief Agencies in July, 1931 


Relief per 

Relief Case 

X 

Deviations 
from Mean 
d 

Deviations 
from Median 
d 

$26.85 

-$ 3.196 

$ .615 

28.99 

— 1.056 

2-755 

48 . 62 

18.574 

22.385 

34 31 

4.264 

8.075 

24.30 

- 5 746 

- 1-935 

12.31 

- 17-736 

- 13925 

34-40 

4-354 

8 . 165 

25.62 

- 4.426 

- .615 

45 92 

15.874 

19.685 

38 5 9 

8.544 

12.355 

J 2-45 

22 . 404 

26.215 

24.05 

- 5 • 996 

- 2.185 

21 . 17 

- 8.876 

- 5.065 

24.23 

- 5.816 

— 2.005 

36.64 

6 . 594 

10.405 

19.08 

— 10.966 

- 6.155 

18.61 

- 11.436 

- 7.625 

42.99 

12.944 

16.755 

17.78 

— 12. 266 

- 8.455 

24.01 

— 6 . 036 

— 2.225 


$187,104 

$177,600 


M = 30.046 Md = 26.235 


8 Yule, op. cit.y p. 145. 



STATISTICAL ANALYSIS 


239 

The algebraic signs have been inserted to show that the algebraic 
sum of the deviations from the mean is zero — the deviations are 
actually — 93-55- an d +93.552. But the algebraic sum of the 
deviations from the median is not zero — the deviations are 
— 50.190 and + 1 27.410. Since signs are neglected in computing 
the average deviation, the inequality of the plus and minus devia- 
tions from the median does not affect the result. It should be 
noted that it is necessary to carry the deviations from the mean 
to three decimal places in order to make the plus and minus 
deviations equal. 

The formula for the computation of the average deviation from 
either mean or median is: 


A. D. = average deviation 

d = deviation from the average 
N = number of items 

Substituting in this formula the values for deviations from the 
mean, we have: 

A. D. = i 8 7- 10 4 
20 

= 9-355 

Using the values of the deviations from the median, we have: 

A.D.= ,77 - 6 °° 

20 

= 8.862 

The average deviation from the mean is .493 larger than the 
average deviation from the median. This indicates that the values 
of the items cluster a little more closely about the median than 
they do about the mean. But the location of the median precludes 
the full influence of the higher deviations, as can be seen from 
the excess of plus over minus deviations from the median. If the 
purpose of the worker is to allow full weight to all deviations, 
then the average deviation from the mean would be the one to 
use. If he wants to emphasize the value from which the sum of 
the deviations is least, then he should use the median. 

It does not often happen in practice that the data used are 
ungrouped. For that reason it is necessary to have a method for 



240 


SOCIAL STATISTICS 


computing the average deviation from grouped data. The long 
method of computing the average deviation will be illustrated 
first. The data used will be the ages of unemployed workers in 
Boston. Table XL VI I shows the details of this method: 

TABLE XLVII 

Computation of the Average Deviation from the Mean and from the Median 
for the Ages of Unemployed Workers in Boston 


Age 

(1) 

m 

(2) 

Deviations from 
Mean — 

(m — M ) 
d 

( 3 ) 

L 

fd 
• ( 5 ) 

From 
Median 
(m — Md ) 
d 
(6) 

fd 

( 7 ) 

IO-I4 

12.5 

26 . 2 

13 

340.6 

25.3 

328.9 

15-19 

17-5 

21 .2 

U 745 

36 , 994.0 

20.3 

35.4235 

20-24 

22.5 

16.2 

2,968 

48,081.6 

15-3 

45 , 410.4 

25-29 

27.5 

11 .2 

2,448 

27,417.6 

10.3 

25,2144 

30-34 

32-5 

6.2 

2,176 

I. 3 . 49 I -2 

5-3 

11,532.8 

35-39 

37-5 

1 . 2 

2,323 

2,787.6 

•3 

696 . 9 

4O-44 

4 2 • 5 

3-8 

2,234 

8 , 489.2 

4-7 

10 , 499.8 

45-49 

47 5 

8.8 

2,195 

i 9 * 3 1 6 .° 

9-7 

21,291 . 5 

50-54 

52-5 

138 

1,786 

24 , 646 . 8 

14-7 

26,254.2 

55-59 

57-5 

18.8 

U 544 

29,027.2 

19-7 

30,416.8 

60-64 

62.5 

23 . 8 

1,107 

26 , 346 . 6 

24-7 

27,342.9 

65-69 

67-5 

28.8 

723 

20,822.4 

29.7 

21,473-1 




21,262 

257,760.8 


255,885.2 


M =38.7 Md = 37.8 


The principal difference in the computation from grouped data as 
compared with ungrouped data is that the deviations from the 
average are taken from the mid-value of the class-interval and 
then multiplied by the class frequencies. The deviations from w, 
the mid-values, are shown in columns (3) and (6), and the fre- 
quencies are given in column (4). The products of the deviations 
and the frequencies are given in columns (5) and (7). The results 
are: 


Using the mean : 


A D = — 
A.D. N 


A. D. 

A. D. 


21,262 

= 12.1 years 

= 255 > 88$-2 
21,262 

= 12.0 years 


Using the median: 



STATISTICAL ANALYSIS 


241 


There is in the two average deviations a difference of .1 of a year. 
This is so small as to be unimportant except for theoretical 
purposes. 

This long method requires the use of large numbers and much 
labor. The same results can be obtained by using a short method 
for computing the average deviation. This short method is illus- 
trated below: 


TABLE XLVIII 

Computation of the Average Deviation for the Same Data by the Short 

Method 


Age 

(0 

m 

/ 

(3) 

Step-Deviations 
from Assumed 
Mean 
d 
(4) 

fd 

( 5 ) 

Step-Deviations 
from Assumed 
Median 
d 

( 6 ) 

fd 

( 7 ) 

10-14 

12.5 

13 1 


-5 

65 

-5 

65 

15-19 

17-5 

1,7451 


-4 

6,980 

-4 

6 , 980 

20-24 

22.5 

2,968 

/ 

-3 

8,904 

“3 

8,904 

25-29 

27.5 

2,448 

• 1 1073 

-2 

4,896 

-2 

4,896 

30-34 

32.5 

2,176 


— 1 

2,176 

— I 

2,176 

35-39 

37-5 

2 . 323 J 


0 


O 


4O-44 

42.5 

2,234' 


1 

2,234 

I 

2,234 

45-49 

47-5 

2,195 


2 

4,390 

2 

4,390 

50-54 

52.5 

1,7861 

10 rXo 

3 

5,358 

3 

5,358 

55-59 

57-5 

1,544 

>9309 

4 

6,176 

4 

6,176 

60-64 

62.5 

1 , 107 


5 

5.535 

5 

5,535 

65-69 

67-5 

723 . 


6 

4,338 

6 

4,338 


21,262 51,052 


The data above may be substituted in the following formula: 


in which »i is the number of items for which deviations measured 
from the assumed average are smaller than deviations measured 
from the true average; n-> is the number of items for which 
deviations measured from the assumed average are larger than 
deviations measured from the true average; c is the difference 
between the assumed average and the true average; and i is the 
value of the class-interval. Since the deviations above are not in 
terms of years but in terms of steps, or class-intervals, c must be 
expressed as a fraction of a step: 



242 


SOCIAL STATISTICS 


A D = 5^05^+ (11,673 ~ 

21,262 

= 12.1 years (using the mean) 

A D = 51,052+ (11,673 - 9,589)-o6 
21,262 ^ 

= 1 2. i years (using the median) 

The results by the short method are identical with those obtained 
by the long method. In the illustration the small exceed the large 
deviations from the average because the true mean, 38.7, is nearer 
the upper limit of class 35-39, than is the assumed mean, 37.5; 
but sometimes the situation will be reversed, in which case m will 
be larger than m, and it will be necessary to subtract the correction 
factor instead of adding it. But this should be clear from the 
formula. If the expression inside the parentheses is a minus quan- 
tity, then the plus sign in front of the parenthesis leaves it a minus 
and indicates subtraction, because a plus times a minus gives a 
minus quantity. The deviations on the side of the assumed average 
will always be smaller than they should be. In the illustration the 
assumed average is less than the true average. Hence, all the fre- 
quencies in the class-interval containing the assumed average and 
all those in lower class-intervals will be too small by the amount 
of the correction factor. The deviations in all class-intervals higher 
than that in which the assumed average falls will be too large by 
the amount of the correction factor. Suppose the assumed average 
is higher than the true average. The rule still holds, but the small 
deviations are now at the upper end of the distribution, and the 
large deviations are at the lower end. (The average deviation from 
the mean and the median is the same to one decimal place in this 
problem, but this would not generally be true.) 

Occasionally it may be desirable to obtain the average deviation 
of death rates in a city for a period of twenty years. The average 
deviation can be found by using the method for ungrouped data. 
A caution should be mentioned, however. Time series are com- 
plex. They generally show four types of variation: trend, cycle, 
seasonal fluctuation, and residual fluctuation. A measure of dis- 
persion applied to time series usually means less than when applied 
to frequency distributes. 

The average deviation is simple to compute. It may be computed 
from either grouped or ungrouped data, and either the mean or 
the median may be used as the average. Although useful, the 
average deviation is not employed as much, as a step in further 
statistical analysis, as is the standard deviation. 



STATISTICAL ANALYSIS 


*43 


5. STANDARD DEVIATION 

The standard deviation is the square root of the mean of the 
squares of the deviations from the arithmetic mean. It is never 
computed from any average but the mean. The concept of the 
standard deviation developed in connection with studies of the 
normal curve of error during the nineteenth century. Efforts to 
measure the probability of error due to chance resulted in the con- 
cepts known as the “modulus,” the “mean error,” and the “prob- 
able error.” Working with biological data, Karl Pearson found it 
more convenient to work with the concept to which he gave the 
term, standard deviation . 3 The method had been used before this 
time, but Pearson’s use has given it currency. The standard devia- 
tion enters into so much statistical analysis that it is particularly 
important for the student to understand its meaning and its 
method of computation. 

The method of computation is similar to that of the average 
deviation, except that the deviations are squared, which disposes 
of the algebraic signs by making all signs plus. The standard 
deviation may be computed from grouped or ungrouped data, 
and it may be computed by both the short and the long method. 
The long method will be illustrated first by the use of the ages of 
unemployed workers in Boston. 

TABLE XLIX 

Computation of the Standard Deviation of the Ages of Unemployed Workers 
in Boston by the Long Method 


Age 

(1) 

m 

00 

/ 

(3) 

d 

(4) 

d‘ 

(5) 

(6) 

IO-I4 

12.5 

13 

26.2 

686.44 

8,923.72 

15-19 

17-5 

1.745 

21 .2 

449-44 

784,272.80 

20-24 

22.5 

2,968 

16.2 

262.44 

778,921 .92 

25-29 

27-5 

2,448 

1 1 .2 

125.44 

307,077.12 

30-34 

32.5 

2, 176 

6.2 

38.44 

83.645.44 

35-39 

37-5 

2,323 

1 .2 

1 .44 

3.345-12 

4O-44 

42-5 

2,234 

3-8 

14.44 

32,258.96 

45-49 

47.5 

2,195 

8.8 

77*44 

169,980.80 

50-54 

52.5 

1,786 

13.8 

190.44 

340,125.84 

55-59 

57-5 

L544 

18.8 

353 44 

545, 711.36 

60-64 

62.5 

1,107 

23.8 

566.44 

627,049.08 

65-69 

67-5 

723 

28.8 

829.44 

599,685.12 


21,262 4,280,997.28 


“Walker, Helen M., Studies in the History of Statistical Method, pp. 52-54. 
64. Williams & Wilkins, Baltimore, 1929. 



244 


SOCIAL STATISTICS 


The symbols in this table have the same meaning which they 
have in the formula for the average deviation, and the general 
formula for the standard deviation computed from grouped data 
is as follows: 

Small sigma is the symbol for the standard deviation. Substituting 
the data from Table XLIX in this formula, we have: 


21,262 

14.2 

The standard deviation is somewhat larger than the average and 
the quartile deviations. The relations of these three measures of 
dispersion will be discussed later in the chapter. 

If the data are ungrouped, the procedure is simple. The formula 
is 


in which d is the deviation from the arithmetic mean. The sum of 
the squared deviations from the mean is divided by the number of 
items, and the square root of the result is taken, giving the stand- 
ard deviation. 


TABLE L 

Computation of the Standard Deviation of the Ages of Unemployed Workers 
in Boston by the Short Method 


Age 

(0 

m 

(2) 

o) 

Steps 

d 

(4) 

-fd 

( 5 ) 

+/d 

( 6 ) 

Id 3 

( 7 ) 

10-14 

12.5 

13 

“5 

65 


325 

15-19 

17 -5 

1 >745 

-4 

6,980 


27,920 

20-24 

22.5 

2,968 

-3 

8,904 


26,712 

25-29 

27-5 

2,448 

-2 

4,896 


9,792 

30-34 

32.5 

2,176 

~i 

2,176 


2,176 

35-39 

37-5 

2,323 

0 




4O-44 

42.5 

2,234 

1 


2,234 

2,234 

45-49 

47-5 

2,195 

2 


4,390 

8,780 

50-54 

52.5 

1,786 

3 


5,358 

16,074 

55-59 

57-5 

r .544 

4 


6,176 

24,704 

60-64 

62.5 

1,107 

5 


5,535 

27,675 

65-69 

675 

723 

6 


4,338 

26,028 



21,262 


23,021 

28,031 

172,420 



STATISTICAL ANALYSIS 


*45 

There is even more reason for using a short method of com- 
puting the standard deviation than for computing averages or the 
average deviation, because squaring the deviations from the mean 
increases the size of the numbers handled to very large quantities. 
The short method is illustrated in the next table. 


Multiplying by 5, 



172,420 _ /28,°3 i — 23,021V 
21,262 \ 21,262 / 

2.84 step deviations 
14.2 years 


Using the short method, the standard deviation is identical with 
the standard deviation computed by the long method. But the 
numbers handled are smaller, and this makes for rapidity of com- 
putation and reduces the chances of error. It should be noted that 
the correction factor computed in the use of the short method is 
always subtracted from the sum of the fd 2J s divided by the sum 
of the items, and that it is squared before subtracting. As sug- 
gested above, the reason for this is that the square of the deviations 
from the arithmetic mean is a minimum. It follows, therefore, 
that, if any correction is required, it must be because the sum of 
the deviations from the assumed mean is too large and, hence, 
must be decreased by the amount of the correction factor. In the 
above case the assumed mean is 37 .5, whereas the true mean is 
38.7. The result is that each deviation is too large by the amount 
of the correction factor. Since the deviations under the radical are 
already squared, it follows that the correction factor must be 
squared before deduction. 

We may summarize the advantages which make the standard 
deviation preferable to any other measure of dispersion, unless 
special reasons exist for using some other measure. Squaring re- 
moves the differences of signs and gives weight to extreme varia- 
tions. The standard deviation lends itself to algebraic treatment, 
is rigidly defined, is based upon all observations, is the most com- 
monly used measure of dispersion, and is a step in many other 
statistical procedures. The squaring and extraction of the square 
root may appear to be rather complicated, but practice reduces 
this apparent difficulty, and the use of a table of squares and 


8a The correction factor, r 2 , is 




246 


SOCIAL STATISTICS 


square roots reduces the labor to a matter of listing the squares 
and roots. 


6. RELATIONS OF Q, A.D., AND O' 

In a perfectly symmetrical frequency distribution constant rela- 
tions exist among the quartile, the average, and the standard devia- 
tion. It is rare in social statistics to find even a close approximation 
to a symmetrical distribution, but some distributions are sufficiently 
symmetrical to make significant comparisons with moderately 
asymmetrical distributions. The following table gives the ratios of 
each of the three measures of dispersion to the others, as com- 
puted by Thorndike: 


TABLE LI 

The Relative Values of Three Measures of Dispersion 


Measures of 
Dispersion 

o> 

Perfectly Symmetrical 
Distribution 1 

(2) 

Ages of 21,262 
Unemployed Workers 
( 3 ) 

Differences 

(2) - ( 3 ) 

(4) 

a 

1 .2533 times A. D. 

1 . 1736 times A. D. 

•0797 

a 

1.4825 “ Q 

1. 2137 “ Q 

.2688 

A. D. 

•7979 »■ 

.8521 “ <r 

-.0542 

A. D. 

1.1843 “ Q 

1 .0342 “ Q 

.1501 

Q 

.6745 “ - 

.8239 “ a 

-.1494 

Q 

.8453 “ A. D. 

.9669 “ A. D. 

~ . 1216 


1 Thorndike, E. L., Mental and Social Measurements , 2 ed., 1913, p. 67. 


The differences between a perfectly symmetrical distribution and 
the distribution of ages of the unemployed workers are not large 
but they suggest a considerable variation of the latter from the 
bell-shaped curve. The ideal curve is a norm to which other curves 
approach more or less closely. 

The differences between the different measures of dispersion are 
shown graphically below: 

It is clear from the diagrams that plus and minus once the 
quartile deviation from the median, plus and minus once the aver- 
age deviation from the mean, and plus and minus once the standard 
deviation from the mean include an increasing proportion of all 
the items in the order named. In a perfectly symmetrical dis- 
tribution 50 per cent of all the items fall between the value equal 
to the median minus Q and the value equal to the median plus Q. 
In a perfectly symmetrical distribution 57.5 per cent of all the 
items are included between the value equal to the mean or median 
minus the average deviation and the value equal to the mean or 



STATISTICAL ANALYSIS 


H7 


_ Y= UNEMPLOYED WORKERS 
3500 


3000 


2500 

Md 

2000 


1500 

-Q +Q 

1000 

\ 

500 


X=AGES 

'10 15 20 25 30 35 40 45 50 55 60 65 70 

Figure LII. — Area of Surface Enclosed by Plus and Minus Once the 
Quartile Deviation from the Median Age of Boston Workers 





248 


SOCIAL STATISTICS 



Figure LIII. — Areas of Surface Enclosed by Plus and Minus Once the 
Average Deviation and by Plus and Minus Once the Standard Deviation 
from the Mean Age of Boston Workers 




STATISTICAL ANALYSIS 


249 


median plus the average deviation. Similarly, in a perfectly sym- 
metrical distribution 68.26 per cent of all the items are included 
between the value equal to the mean minus the standard deviation 
and the value equal to the mean plus the standard deviation. The 
corresponding percentages in asymmetrical distributions will differ 
in varying amounts from these quantities for a normal distribution. 
In a normal distribution the values equal to plus and minus twice 
the standard deviation from the mean will include approximately 
95.5 per cent, and the values equal to plus and minus three times 
the standard deviation from the mean will include approximately 
99.7 per cent of all items. In asymmetrical distributions the per- 
centages will vary, but it is a good rule to remember that the 
above percentages hold for ideal distributions. 

7. COEFFICIENT OF RELATIVE VARIABILITY 

The measures we have been discussing are measures of absolute 
variability. Sometimes, however, it is desirable to compare the 
variability of two statistical series expressed in different units of 
measurement. For example, we might want to express the com- 
parative variability of wages expressed in weekly amounts and 
salaries expressed in monthly amounts. Obviously, the standard 
deviations of the two series would not be comparable. Some way 
must be found for expressing the relative variability of these two 
quantities. The required measure of relative variability will be the 
ratio of the measure of absolute variability to an average. In order 
to express the ratio as a percentage, it may be multiplied by 100. 

There are several ways of computing the coefficient of relative 
variability, depending upon the measure of absolute variability 
and the type of average used. The formulas for computing the 
coefficient of relative variability are as follows: 



A. D. 


Md, M, or Mo 


If the average deviation is used, the coefficient of relative varia- 
bility may be computed with the use of the median, the mean, or 
the mode, but the same average should be used in this formula as 
was used in computing the average deviation. The use of these 
two formulas will be illustrated below, using the data for Boston 
unemployed workers: 



250 


SOCIAL STATISTICS 


y = £4^ 

M 38.7 

= .367, or 36.7 per cent 

Using the average deviation from the median, instead of the 
standard deviation from the mean, the substitution is as follows: 


Md 37.8 

= .320, or 32.0 per cent 

There is no particular advantage in changing the ratio to a per- 
centage except that we are more accustomed to thinking in terms 
of percentage. On the basis of the standard deviation, which is the 
most common way of computing the coefficient of relative varia- 
bility, the coefficient of relative variability is 36.7 per cent. The 
ages of unemployed workers in some other city might be taken 
for purposes of comparison and the coefficient of variability com- 
puted to see whether there was less or more variability in the 
other city. A low coefficient of relative variability, like a low 
measure of absolute variability, indicates a high degree of homo- 
geneity in the data for the trait measured. 

8. MEASURES OF SKEWNESS 

Up to this point the discussion of variability has been concerned 
with the individual items — the average variation of each item from 
some measure of central tendency. But sometimes it is desirable to 
have a measure of the variability of the whole mass of data. 
Previous measures of variability have not indicated the direction 
in which variability is most pronounced — that is, toward the lower 
or the higher values. The measure of this type of variability is 
called a measure of skewness. When data are plotted in frequency 
curves, they may be concentrated at one end or the other of the 
distribution — that is, the distribution may be skewed, as most 
empirical frequency distributions are. Hence, a measure of skew- 
ness shows the amount of skewness and the direction of the skew. 
Looking at Figure LIII, it is obvious that there is a concentration 
of ages at the lower „end of the scale and that the tail of the 
curve is longer in the direction of the high age groups, which 
means that the age distribution is skewed in that direction. 

Skewness is a function of both central tendency and variation 
from central tendency. Wherefore, it should be measured in terms 



STATISTICAL ANALYSIS 


251 


of these quantities. Two formulas are commonly used for com- 
putation of skewness: 

a 

in which M is the arithmetic mean and Mo the mode. This is 
Karl Pearson’s formula. The other formula is: 


S.\ + $.3 — 2 Md 


Substituting in the first formula to find the skewness in the dis- 
tribution of Boston unemployed workers, we have: 

Sk = 

14.2 

= +-19 

This distribution is, thus, skewed slightly in the direction of the 
higher values. The mode was computed by the formula: 

Mo = M - 3(M - Md) 

The skew may vary from 0 to =*= 1 but can never exceed 1. 

9. EXERCISES 

1. The following table gives the number of unemployed male 
workers in Chicago : 

TABLE LIT 

Unemployed Male Workers in Chicago at the Time of the 
Census in April, 1930, According to Age. Class A 1 


Age in Years Number Unemployed 


Total 


122,685 


10-14 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

50-54 

55-59 

60-64 

65-69 


19 

9.399 

18.283 

15,686 

13.870 

15,014 

13.996 
1 2 , 602 
9.439 
6,790 
4.784 
2,803 


1 Unemployment Bulletin , Illinois , by the United States Bureau 
of the Census, 1930. 



252 


SOCIAL STATISTICS 


(a) Find the quartile deviation of the above age distribution. 

(b) Find the average deviation of the above age distribution. 

(c) Find the standard deviation of the above age distribution. 

(d) Find the coefficient of relative variability for the above 

distribution. 

(e) Find the coefficient of skewness for the above distribution. 

(f) Compare your measures of dispersion with the measures 
of dispersion for the Boston unemployed men. Are there 
significant differences? If so, how do you account for 
them? 

2. Consult the United States Census of 1930 concerning marital 
status in your own state. Compute by counties the per cent of 
the total population which is married. What is the standard 
deviation of these percentages? Do the same thing for another 
state in a different geographical section of the country. What 
differences do you find? How do you account for them? Can 
you see any way by which .the differences in percentages mar- 
ried in different counties might affect such social problems as 
crime and dependency? 

3. The following table gives the ratio of males per 100 females 
admitted to hospitals for the insane in 1927: 

TABLE LI 1 1 

Ratio of Males per ioo Females Admitted to Hospitals for 
the Insane by States in 1927 1 


State Males per 100 Females 


United States 140. 4 

Alabama 101.9 

Arkansas 134-9 

California 179-5 

Connecticut 138.7 

District of Columbia 29 5.3 

Florida 142. 7 

Georgia 108.0 

Illinois 167.7 

Indiana 123.8 

Iowa 144 .9 

Kansas 148.2 

Kentucky 141.3 

Maine * 134-9 

Maryland 127.9 

Massachusetts 1 13 . 1 

Michigan 183.1 

Minnesota 151.0 

Mississippi 1496 

Missouri 135-1 



STATISTICAL ANALYSIS 
TABLE LI 1 I — Continued 


253 


State 


Males per 100 Females 


Nebraska 

New Hampshire 
New Jersey. . . . 

New York 

North Carolina. 

Ohio 

Oklahoma 

Oregon 

Pennsylvania. . . 
Rhode Island . . 
South Carolina. 

Tennessee 

Texas 

Virginia 

Washington 

West Virginia. . . 
Wisconsin 


164.0 

102.0 
129.8 

128.2 
1 1 7 9 
M 2.3 
1396 
196.5 

123.0 

138.0 

104.2 

123.7 

hi. 8 


128.3 

185-3 


Mental Patients in State Hospitals , United States Bureau of the 
Census, 1930. A few states are omitted, because rates are not given. 


(a) Compute the 10th, 25th, 40th, 50th, 60th, 75th, and 90th 
percentiles for the ratios in the above distribution. 

(b) Is there any way to account for the wide variation in the 
ratios? What about differences in administrative policies, 
differences in racial or national composition of the popula- 
tion, or the sex ratio in the states? 

IO. REFERENCES 

Chaddock, Robert E., 'Principles and Methods of Statistics , Chap. 
IX. 

Kelley, T. L., Statistical Methody Chap. IV. 

Mills, Frederick C., Statistical Methods , Chap. V. 

Thurstone, L. L., The Fundamentals of Statistics , Chaps. 13-16. 

Walker, Helen M., Studies in the History of Statistical Methody 
Chaps. II (sec. 5) and IV. 



CHAPTER X 


Index Numbers 


I. THE NATURE OF INDEX NUMBERS 

An index number is a device for showing the average percentage 
change in prices, production, dependency, crime, etc., from one 
point of time to another or the variation from one geographical 
locality to another. An index number is, therefore, a kind of 
average, but it is so different from other averages that it is treated 
separately. Index numbers may be expressed as ratios or in terms 
of thousands, but generally they are expressed as percentages. It 
is difficult for the mind to grasp the relative size of crude quanti- 
ties, but comparison becomes easy if the crude quantities are ex- 
pressed as percentages of one of the quantities taken at a particular 
time or in a certain locality. Some period of time or geographical 
area is selected as the base to which quantities from all other 
periods are related in terms of percentage. The base year, month, 
or area is not selected carelessly ; it serves best when it is about 
an average time or place. This base, then, becomes a sort of arbi- 
trary “normal.” As time passes it may be desirable to change the 
base period, because the original may cease to be representative or 
one nearer to the present time may be more satisfactory. 

An illustration will make clearer the value of index numbers. 
The United States Bureau of Labor Statistics publishes an index 
number for the cost of living. This is concerned with what it costs 
families to live at one period as compared with a base period, and 
covers food, clothing, rent, fuel and light, house furnishing goods, 
and miscellaneous items in the family budget. The average cost 
of living in 1913 is taken as the base period and is denoted as 
1 OO.O. The average cost of living in each subsequent half-year is 
expressed as a percentage of the cost of living in 1913. According 
to the Bureau of Labor Statistics, the cost-of-living index in June, 
1920, was 216.5. That is, in seven years’ time there had been an 

254 



STATISTICAL ANALYSIS 


*55 


increase of 116.5 per cent in the cost of living. By the same 
standard the cost-of-living index in December, 1930, was 160.7. 1 
The cost of living had declined markedly since 1920, but it was 
still 60.7 per cent higher than in 1913. If a family had retained 
its 1913 standard of living, its money income would have to be 
60.7 per cent greater in 1930. This index is for cities and probably 
does not reflect exactly the cost of living in rural areas. Professor 
Paul H. Douglas computed an index of “real wages” from 1890 
to 1926 — “real wages” refers to the comparative purchasing power 
of wages at different periods. He found that in industry as a whole 
in the United States, using 1914 as a base of 100.0, the index for 
1926 was 130.O. 2 The index of the cost of living computed by the 
Bureau of Labor Statistics stood at 174.8 in June, 1926. These 
two indexes are not quite comparable, because the cost-of-living 
index uses 1913 as a base and the real-wages index uses 1914. 
But even if 174.8 is a few points too high, it is clear that wage 
rates had not gone up as rapidly as the cost of living; conse- 
quently, there must have been a reduced standard of living among 
wage workers. If costs of living of rural people had been included 
in the cost-of-living index, it would be somewhat lower still, but 
making due allowance for this fact, up to 1926 the cost of living 
seems to have advanced more rapidly than real wages. This illus- 
tration shows the usefulness of an index number. It makes com- 
parisons easy, because the relative size of the quantities in different 
years is expressed in terms of percentage and because the base 
period preceded the World War and represented a time of fairly 
normal economic conditions. Using both index numbers, we get a 
rough idea of the trend in the standard of living among wage 
workers, a fact of great importance to social workers and to stu- 
dents of the social sciences. 

There is one index number which has probably more general 
use than any other, and that is an index of the general price level. 
Its aim is to measure the changing purchasing power of money, 
and it is employed in any kind of study dealing with money costs 
over a period of time. Several general price indexes have been 
computed. For purposes of illustration, the Index of the General 
Price Level published by the New York Federal Reserve Bank 
will be used. Indexes of either wholesale or retail prices do not 

1 Monthly Labor Review, Vol. 32, No. 2, p. 214. 

2 Douglas, Paul H., Real Wages in the United States, 1890-1926, p. 205. 
Boston: Houghton Mifflin Co., 1930. 



SOCIAL STATISTICS 


256 

accurately reflect the general price level. Because of this fact, Mr. 
Carl Snyder, of the New York Federal Reserve Bank, undertook 
to compute an index which would take into consideration gjl 
aspects of price. His index contained four major groups of prices: 
wholesale commodity prices, retail commodity prices, wages, and 
rents. 8 This index uses 1913 as the base year, or 100.0, and it 
includes annual indexes from 1875 to the present time; monthly 
indexes are also published. According to Snyder, the index of the- 
general price level in 1920 was 193.0. That is, what a dollar 
would purchase in 1913 would take $1.93 in 1920. By 1930 the 
index had dropped to 168.0, that is, prices had fallen; or, to put 
it another way, the purchasing power of money had risen again. 
Any comparison of money costs from one year to another requires 
the use of a price index to reduce the volume to comparable 
dollars. For example, if the operation of a hospital cost $1,000,000 
in 1913 and the same standards of service are maintained without 
effecting economies anywhere, the amount required in 1930 would 
be $1,680,000. 

By this time it will have occurred to the student that the com- 
putation of an index of the cost of living or an index of real 
wages is complicated and laborious. The computation of some 
index numbers is much simpler, because fewer quantities are 
combined. Wherever many quantities have to be combined the 
process is long. Even in the food item of the cost-of-living index 
there enters the problem of combining costs of many kinds of 
foods. A means of assigning relative importance to these items of 
food has to be found. Then the relative importance of food, 
clothing, rent, etc., has to be determined before they can be com- 
bined to compute a general index of the cost of living. Methods 
of doing this will be described later in the chapter. 

2. THE PRINCIPLE OF INDEX NUMBERS APPLIED TO SOCIAL DATA 

Index numbers were invented as measures of changes in prices, 
but in recent years they have been applied to many other kinds 
of data. The Standard Trade and Securities Service publishes a 
compilation of several hundred index numbers. Some are general 
indexes, such as indexes of general prices, but many of them are 
specific indexes, such as indexes of prices of particular commodities 
or production in special industries. The application of the prin- 

8 Snyder, Carl, “The Measure of the General Price Level,” Review of Eco- 
nomic Statistics, February, 1928, p. 10. 



STATISTICAL ANALYSIS 


257 

ciple of index numbers to sociological data is quite recent and not 
far developed, except in certain fields which lie on the border 
between strictly economic territory and the sociological field. These 
marginal fields are represented by indexes of the cost of living 
and of real wages. Furthermore, up to the present index numbers 
have dealt largely with time series. But there is no reason why 
they cannot be applied to many kinds of sociological data and to 
non-temporal series. 

3. THE USE OF INDEX NUMBERS IN TIME SERIES 

As stated above, the principle of index numbers was first applied 
to time series, particularly to price changes over a period of time. 
But to what kinds of sociological data can the principle be applied? 
The answer is that it can be applied to any kind of quantitative 
data which change in time. For a number of years Dr. Ralph G. 
Hurlin, of the Russell Sage Foundation, has been collecting data 
from family relief agencies, and he has worked out monthly in- 
dexes. 4 These show the changing case loads of reporting relief 
agencies month by month. A glance at the charts given by Dr. 
Hurlin is sufficient to see how the case load varies at different 
times of the year. For the monthly indexes January, 1926, is 
taken as the base period, or 100.0, and the case load of each suc- 
ceeding month is expressed as a percentage of this period. The 
present writer has employed the principle of index numbers to 
measure the trend of the volume of public welfare work in In- 
diana. 5 Special indexes were computed for the number of persons 
aided per 100,000 population each year from 1900 to 1927 for 
each general type of public welfare work, including hospitals for 
the insane, penal institutions, poor asylums, child wards of the 
state, institutions for the feeble-minded, etc. A system of weights 
was devised, based upon the annual cost per person aided for each 
type of work, and then all the series were combined to form a 
general index of public welfare work in Indiana. The base year 
was 1913. In this general index corrections have been made for 
changing population and for the changing value of the dollar. 0 
The general index shows a general rise in the volume of welfare 
work carried on by the State of Indiana, even when due allowance 

* Hurlin, Ralph G., “Indexes of Family Case Work Loads,” Survey, February 
15, 1928. 

6 See “Indexes of Public Welfare Work in Indiana,” Social Forces , December, 

I9 2 9. 

* See Table XXVIII, p. 215, for the general index, 



258 


SOCIAL STATISTICS 


has been made for population and the purchasing power of the 
dollar. 

Whenever interest centers in rates of change or directions of 
change in a series of social data, the principle of index numbers 
is a possible method to determine these facts. Birth and death 
rates may be expressed in terms of index numbers with a fixed 
base period. The increase in the number of apartments in a city 
year by year is an indication of shift from the family dwelling .to 
a collective type of housing; an index number showing the rate 
of change might be of considerable value to the construction in- 
dustry, to investors, to school authorities, and to students inter- 
ested in the birth rate. An index number of the work certificates 
issued to children before the legal working age would indicate to 
the issuing authorities the changing tendencies of children to leave 
school as soon as possible or to remain in school longer. The 
specific types of time series to which the principle of index numbers 
may be applied is limited only by the requirements of the problem 
in hand. 

The use of index numbers in connection with data distributed 
in space is less familiar than their use in time series, but some 
index numbers of the former kind have been constructed with 
promising results. It is more common to use ratios or rates for 
geographical areas. For example, death rates are computed for 
census tracts, cities, counties, and states. These furnish a means of 
comparing death rates, or, for that matter, the incidence of any 
other social problem. If an average death rate is taken as a sort 
of norm, then we have substantially an index number, though it 
may not be expressed as a percentage of the average rate, the latter 
corresponding to the base period. Whether or not it is desirable 
to transpose rates for spatial data is largely a matter for the 
judgment of the investigator. Two illustrations of index numbers 
based upon spatial data will be given. 

Dr. C. Luther Fry made use of index numbers to express church 
attendance in 32 counties, where he studied this subject. He took 
Salem County, New Jersey, as the base, or 100.0, and expressed 
the “attendance interest ratios” of the other 31 counties as per- 
centages of this base county. His index numbers vary from 43.7 
in Pend Oreille County, Washington, to 19 1.3 in Monroe County, 
Georgia. Alongside of his index numbers for “attendance interest 
ratios” he has placed index numbers for the “membership ratios” 
in the counties. He computes the degree of correlation between 



STATISTICAL ANALYSIS 


259 

attendance interest and membership ratios and finds it very high. 
Thus, the computation of index numbers here is done partly to 
indicate the variation in each series, but also to provide a basis for 
computing a coefficient of correlation. 7 

Professor C. Horace Hamilton has made another use of an 
index: to measure the relative roughness of topography. In the 
published report of his study of the relation of topography to 
social development in certain counties of Virginia he has not indi- 
cated whether or not he adopted a base county to represent 100.0. 
But it is apparent that his figures lend themselves to conversion 
to the conventional forms of index numbers. He says: “The social 
development of that area [Appalachian Highlands] is limited by 
its topography more than by any other one factor. In making 
social studies of such mountainous areas or in planning institu- 
tional development in them, it is desirable to have an accurate 
method of measuring the influence of topography. The problem 
resolves itself into the construction of an index of topography 
which can be used in making correlations with various social and 
economic conditions.” 8 Professor Hamilton took a topographical 
map and drew on it vertical and horizontal lines three-eighths of 
an inch apart, this distance being equivalent to 2.5 miles. In each 
county the number of times the horizontal and vertical lines 
crossed a 500-foot contour interval or a stream was counted. The 
total count for the county was then divided by one-hundredth of 
the number of square miles in the county. This quotient is his index 
of topography. He found some high correlations between his in- 
dexes and other social factors in the counties, which partly demon- 
strates the usefulness of his index. By selecting a base county, his 
indexes could easily be transposed into conventional index numbers 
which could be put in an array to show the range of variations in 
topography for all counties in Virginia. 

The two illustrations above suggest how the usefulness of index 
numbers will depend upon the problem in hand, but there is little 
doubt that this type of index numbers can become of much greater 
value in the future. 

If the principle of index numbers is going to be used in the 
study of a problem, the collection of the requisite data comes in 
for early attention. The purpose of the index number will deter- 

7 Fry, C. Luther, Diagnosing the Rural Church , p. in. New York: George H. 
Doran Co., 1924. 

8 Hamilton, C. Horace, “A Statistical Index of Topography,” Social Forces , 
Vol. 9, No. 2, pp. 204, 205. 



260 social statistics 

mine the criterion for collection of data. No formula for the 
computation of an index number will yield reliable results unless 
the data collected are suitable for the purpose. The worker must 
carefully define his purpose at the beginning of his work, and it 
should be stated as concretely as possible. For example, Professor 
Paul H. Douglas undertook to compute an index of relative living 
costs in non-agricultural areas. An urban index is desired; that 
limits the collection of data to cities. But all the items entering 
into the cost of living of a family had to be considered, and it was 
necessary to determine the relative importance of food, clothing, 
rent, etc., in order to weight the expenditures for quantities used. 
Appropriate weights had then to be selected. But he found that 
the relative importance of different items in the family budget 
changes over a period of time; therefore, it was necessary to 
change the weights after a certain year in the series was reached. 
This fact came to light in the process of collecting data for the 
index . 9 No mechanical rule can take the place of logic. The in- 
vestigator must take care to understand the degree of homogeneity 
he is obtaining in his data and must observe the changing im- 
portance of the factors involved. This statement suggests that the 
accuracy of any index number depends upon the judgments the 
investigator made in the early stages of his work, and that it is 
highly relative. That is a fact. The validity of an index number is 
determined in large measure by the technical skill and painstaking 
care of the investigator. 

Most index numbers are based upon samples of data in a statis- 
tical universe and not upon all of the existing data. If the index 
number is to represent approximately the actual situation, the 
sample data must be representative of the statistical universe 
under consideration. This raises the question of random sampling. 
A random sample of data in a given field is such a selection of 
data as to eliminate as completely as possible all influences except 
chance. For example, a random sample of the distribution of 
library borrowers in a city could be made by taking every fifth 
name in an alphabetical index of the borrowers. A random sample 
of relief agencies in New York might be made in the same way, 
but it happens .fhat there are a few large relief agencies and a 
great many small ones. A random sample based upon alphabetical 
arrangement of the names of the agencies might not adequately 
represent the whole relief field because many small agencies and 
* Douglas, Paul H., op . cit., Chap. IV. 



STATISTICAL ANALYSIS 261 

possibly one large one would be included in the sample. The 
method of proportional sampling might be preferable, if the whole 
relief field is to be represented fairly in an index of relief in New 
York. That is, agencies would be consciously selected and not left 
to chance; the judgment of the worker would determine the rela- 
tive importance of the relief agencies in the whole field and would 
accordingly select the agencies to be used. In the computation of 
index numbers this is probably the better method to pursue; that 
is, examine the field carefully and then choose the data which 
give proportional representation to all types in the field . 10 

The question of primary and secondary data arises in the con- 
struction of an index number just as it does in other statistical 
problems. If the investigator collects the original data, he knows 
by experience a good deal about the homogeneity and appropriate- 
ness of his material. But some of his material may be secondary. 
What, then, is he to do? He must make some inquiry into the 
method of collection of the data and estimate their appropriate- 
ness for his own project. Rarely does an investigator construct an 
index number from nothing but primary data. His prices are taken 
from published tables, his weights to be used in measuring the 
cost of living are taken from some independent investigation, or 
his dependency data are taken from published reports. He must 
have some understanding of how these data were gathered and 
what standards of accuracy were observed. The construction of an 
index number would often be far too expensive if only primary 
data were used. Secondary data are satisfactory, but they must be 
used critically. 


4. TYPES OF INDEX NUMBERS 

No effort is made in this chapter to discuss a wide variety of 
formulas but merely to illustrate a few of those which may be used 
most readily by the student. For extensive discussions of the 
validity of different formulas the student is referred to Fisher’s 
The Making of Index Numbers , and to Professor Willford I. 
King’s more recent book, Index Numbers Elucidated . In this 
chapter the elementary methods of constructing index numbers 
will be described. 

The simplest form of comparison of quantities is the crude 
figures. The quantities are added and allowed to stand without 

10 For further discussion of this point, see King, Willford I., Index Numbers 
Elucidated , pp. 64-66. New York: Longmans, Green and Co., 1930. 



262 


SOCIAL STATISTICS 


reduction to relatives and without the use of weights. This in 
reality is not an index number, because by definition an index 
number shows relative change in magnitude. For purposes of 
illustration and comparison the same data will be used in all the 
formulas. The data will be the average amount of relief per allow- 
ance case given by three family relief agencies of New York City 
during a period of four years, 1927 to 1930. 

TABLE LIV 

Amount of Relief per Allowance Case in Three New York Family Relief 

Agencies 1 


Relief per Relief per Relief per Relief per 

Case, 1927 Case, 1928 Case, 1929 Case, 1930 


No. 1 > 44-85 > 41-45 > 44-45 >46.11 

No. 2 49 29 49 54 51.00 52.95 

No. 3 47.90 52.76 53.49 53.97 


Total >142.04 >143 .75 >148.94 


1 From data compiled by the Department of Statistics of the Russell Sage Founda- 
tion. Indexes computed for these relief agencies might just as well have been computed 
in terms of case load; this would remove the changing price factor, and it would repre- 
sent volume of work just as well. 


Examination of the column totals reveals the fact that there has 
been an increase in the amount of relief per allowance case, but 
it is difficult to get a definite conception of the amount of change 
from year to year. The crude figures are too large, and they are 
not in any way related to each other. It is possible to make com- 
parisons between the annual totals, but the percentage change can 
only be guessed. We need the totals expressed in some form that 
reveals the relative amount of relief per allowance case. 

The simplest form of an index number consists of relatives based 
upon the sum of aggregate values unweighted. The formula for 
this index number may be expressed as follows: 


I = index number for the given year 
2 ^o = sum of the quantities in the base year 
2 ^i = sum of the quantities in the given year 

If there is only one quantity in each year, then the summation 
sign is omitted from the formula. For example, if Agency No. 1 
were the only agency being considered, there would be no sum- 
mation. But in Table LIV there are three quantities. The totals 



STATISTICAL ANALYSIS 263 

of ^ the columns, then, will be used in the formula, as follows, 
using 1927 as the base year, or qo 

j_ * 43-75 
142.04 

= 101.2, index for 1928 

The indexes for the other years are 104.8 and 107.7, respectively. 
It is easy to grasp the significance of the changes in allowances, 
when reference is had to these index numbers. The increase in 
allowances over the base year was 1.2 per cent in 1928, 4.8 per 
cent in 1929, and 7.7 per cent in 1930. The sharpest rise occurred 
in 1929, but allowances are still going up. In view of the fact that 
the purchasing power of money was rising during this period, the 
increasing amounts of allowances appear to reflect a more liberal 
policy of relief giving. This might not be true in 1930, because 
the depression may have so depleted the slender resources of 
families that more relief had to be given for that reason. What- 
ever the explanation of the increasing amounts of allowances, the 
index numbers show that an increase is occurring, and that is their 
function. 

Another method of computing an unweighted index for these 
data is that known as the average of relatives. The quantity for 
each agency in the base year is used as the base for computing 
relatives for that agency. Then the arithmetic mean of the rela- 
tives for each year is found. The variation in the formula may be 
expressed thus: 


N 

TABLE LV 


Amount of Relief per Allowance Case in Three New York Family Relief 
Agencies and the Relatives Based upon 1927 


Agency 

1927 

1928 

1929 

1930 

Relief 

Rela- 

tive 

Relief 

Rela- 

tive 

Relief 

Rela- 

tive 

Relief 

Rela- 

tive 

No. 1 

. $ 44.85 

100.0 

t 41 -45 

92.4 

> 44-45 

99.1 

> 46.11 

102.8 

No. 2 

49.29 

100.0 

49-54 

100.5 

51 .00 

103.5 

52.95 

107.4 

No. 3 

• 47 - 9 ° 

100.0 

52.76 

no. 1 

53-49 

in .7 

53-97 

112.7 


Total $142.04 300.0 $ 143-75 303-0 >H 8.94 3 I 4-3 > 153-03 322.9 

Average... 47-35 100.0 4792 101.0 4965 104.8 51.01 107.6 



SOCIAL STATISTICS 


264 

Table LV shows how this type of index number is computed. 
The index numbers are substantially the same as when computed 
by the method of the sum of aggregates, though they are slightly 
higher by the method of average of relatives. If one or the other 
of the two preceding methods is to be used, the first is preferable 
because it requires less arithmetical work. In either case, a definite 
idea of the annual rising cost of allowances is made clear. 

However, an examination of the table reveals the fact that the 
rising cost of allowances proceeds at different rates in the three 
agencies. This fact affects the index numbers as previously com- 
puted, but we are not sure that Agency No. 3 should affect the 
result as much as it does, or possibly it should affect it more. This 
result can be tested by devising a system of weights, based upon 
the number of allowance cases handled by each agency in the base 
year. Then, if the work of these three large relief agencies in New 
York can be assumed fairly to represent the policies of relief 
agencies in allowance cases, we shall have an index number re- 
flecting the changing cost of allowance cases in the City of New 
York. This assumption may or may not be true; it would have 
to be tested by a study of some of the smaller relief agencies, but 
for purposes of illustration we shall make the assumption. 

A weighted index number may be computed by the method of 
either the sum of aggregates or the average of relatives. Both 
methods will be illustrated for purposes of comparison, but first 
a system of weights must be determined. A convenient method of 
weighting the cost of allowances in this problem is to use the 
average monthly allowance case load in each agency, and then 
compute the percentage which each agency load constitutes of the 
sum of all the case loads. The following table shows this process: 


TABLE LVI 

Average Monthly Allowance Case Load of Agencies, and 
the Weights Expressed as Percentages of the Total Case 
Loads 

Agency 

Monthly 

Case 

Load 

Percentage of Total 
— the Weights 

# Total 

L 197 

100.0 

a) 

(2) 

(3) 

No. 1 

304 

2 5-4 

No. 2 

248 

20.7 

No. 2 


<1.0 



STATISTICAL ANALYSIS 


265 


< 

H 

< 

o 


< 

u 


w 

H 

O 

w 

OS 

O 

a 

< 


-5 2 

w 

W > 

3 £ 

5 s 

^ § 

s 

H 

Id 


£ 


2 

O 

H 

< 

H 

D 

a. 

S 

o 

U 


«vooe 
r- on 0 

« O OS 
•-I <-i r« 
JO. 

-t- I- Os 


no n 

rf U-, wo 

W. 


ON Ny, r*} 

ts vnoo 
1-1 0 00 

H M M 

t*. 


: 8 ? 


O O 00 
*-< 11 r» 
t*. 


h cj\n 
-+■ *1" 
to. 


&J 


On O - 

ro M OO 
i-< O 1/ ‘* 

M I-H ri 


-t I " ON 
vr, O ^ 


OOO 

ZZZ 


tfc 

V^, 

tft. 


r^NO 
NO o 

o *-> 


re qj 

*-> -r-J 

£l 



266 


SOCIAL STATISTICS 


In this problem we shall use percentages as weights. The absolute 
numbers in column (2) could be used with about the same ease 
but, if these numbers were large, they would be cumbersome. In 
such cases percentage, or some other ratio indicating relative 
importance, is more convenient. 

The following formula indicates the method of computing 
index numbers from weighted aggregates: 


in which q\ and qo have the same meaning as in the previous for- 
mula and W 1 and Wo are the corresponding weights. Table LVII 
shows the method of computation. 

The effect of weighting is to increase the size of the index num- 
bers. If our weighting system is sound, it is evident that the un- 
weighted index does not properly represent the changing amounts 
of allowances. When an index number is carried through a long 
period of years, the relative importance of its items often changes. 
When these changes become so large as materially to affect the 
results, the weighting system should be revised and applied from 
the point at which the changes became important. In this prob- 
lem that could be determined each year by simply computing the 
percentage of the cases handled by each agency. Another way of 
determining the weights, when the index deals with data of past 
years, is to take the average annual percentage of cases carried by 
each agency for the entire period. Of course, when another year 
passes and the index number is computed for that year, either the 
old system of weights will have to be accepted as adequate or a 
new system computed and the index numbers revised for the 
entire period. It is perhaps easier to use different weights each 
year, based upon the annual allowance case loads of the agencies. 
Index numbers computed for other types of data require the same 
attention to weighting. 

Another type of weighted index number is known as the aver- 
age of relatives weighted. We shall illustrate the method of 
computing this kind of index number and compare the results 
with those obtained by the method of weight aggregates. The 
variation in the formula is as follows: 



STATISTICAL ANALYSIS 


rt u <u „ 
13 > EC 


rt «-> £ .cm 

PsJ H 


h»r) h O oo 

•-I "T «S OO . 

«-r^ ►-* O SO 'O 

«S <S SO •> O 


-2 P gs- 
« “h 


T- O hh so 

’Too ro so 
< ) O os nn 
<s <s wo • O 

o *- 


4> 1 ^ 


RR R 


o o o 
'Z'Z'K 



268 


SOCIAL STATISTICS 


The relatives for individual agencies are taken from Table LV. 
Each relative is multiplied by its weight. The sum of the weighted 
relatives in each year is then divided by the sum of the weights, 
which is 1 00.0, and the resulting index numbers are almost iden- 
tical with the results obtained by the method of weighted aggre- 
gates, as was to be expected. One method is as good as the other, 
but the method of weighted aggregates requires somewhat less 
arithmetical work. 

At times it may be desirable, for special reasons, to shift the 
base year. If an index number extends over a number of years, 
conditions may so change that the original base year is unrepre- 
sentative of the period as a whole. In such cases the base year may 
be changed. If the index number has been constructed by the 
method of the average of relatives weighted, a good deal of re- 
computation is necessary to accomplish this. On the other hand, 
if the index number has been computed by the method of weighted 
aggregates, it is simple to shift the base year. All that is required 
is to select the new base year and then divide all the sums of 
aggregates by the sum for the new base year. For example, if it 
were desired to make 1929 the base year in the illustration given 
in Table LVII, we would simply divide 4742.01, 4922.07, and 
5176.24 by 5067.84, and the new index numbers would be as 
follows: 1927, 93.5 j 1928, 97.1; 1929, 100.0; 1930, 102. 1. A 
change in the base year is equivalent to a change in the weights, 
because the relative size of the items in the new differs from their 
relative size in the old base year. 11 Hence, if it seems wise to shift 
the base year, a consideration of the weighting system is required, 
and new weights may have to be devised. 

Index numbers may also be computed by the method of the 
geometric average of relatives or of aggregates. The nature of 
the geometric average is to show proportional differences. When 
it is used, the resulting index number is likely to be somewhat 
lower, except in the base year, than the index determined by the 
arithmetic average. The principal advantage of an index number 
in which the geometric average is used is that the base may easily 
be shifted. That will be illustrated by the problem which follows, 
and the formula may be written thus: 

N 

11 See King, W. I., op. cit pp. 23-25, for a demonstration of this fact. 



STATISTICAL ANALYSIS 


269 




c* vc 
r-o 00 
o *- r- 

win h 
O Q rh 

ov O O 


OOO 

'Z'ZZ, 



-3 a x 

*5 Ctt 
u "O 




270 


SOCIAL STATISTICS 


The indexes as given in Table LIX are slightly different from 
those computed by other methods, but the differences are not 
great. However, these differences may be considerable. Suppose, 
now, that it is desired to shift the base to 1929. This is done by 
computing the relatives for the different years in terms of 1929 
as the base, finding the logarithms for these relatives, and then 
taking the mean of the logarithms for each year. That is- consid- 
erable work. The same results may be obtained, as Chaddock has 
pointed out, 12 by using the index 104.6 of 1929 as 100.0 per cent 
and dividing each of the other indexes in Table LIX by it. The 
resulting indexes on the new base are: 1927, 9 5.6; 1928, 96.3 ; 
1929, ioo.Oj 1930, 102.9. If these indexes are plotted by the 
side of the indexes given in that table, it will be seen that the 
curves are parallel. That is, using the 1927 base, the ratio of the 
index for 1927 to the index for 1928 is .993, and, using the 1929 
base, the ratio of the index for 1927 to the index for 1928 is .993. 
The geometric average shows proportional change, and the shift- 
ing of the base year does not affect the proportions of the index 
numbers when computed by the method of the geometric average 
of relatives. 

The illustration just given is unweighted, but this average may 
be used equally well in the computation of a weighted index 
number. The logarithm of the relative is multiplied by the appro- 
priate weight. The sum of the weighted logarithms for a given 
year, or other period, is divided by the sum of the weights. The 
quotient is the logarithm of the weighted index number desired. 

5 . THE “BEST” FORMULA 

Much effort has been expended in trying to find an “ideal for- 
mula” for the construction of index numbers. Lately, however, 
less attention has been given to this question, and Professor King, 
one of the most recent writers on the subject, contends that there 
is no “best” formula. 33 The researches whose object was to dis- 
cover an ideal formula may have an historical explanation which 
is to be found in the history of the uses to which index numbers 
have been thought applicable. In the beginning of the construction 
and use of index numbers the interest was almost exclusively in 
prices. An index number was synonymous with a measure of price 

“Chaddock, op. cit., pp. 185-187. 

“King, op. cit., pp. 219, 220. 



STATISTICAL ANALYSIS 


271 


variation. The “ideal formula” which has received the most atten- 
tion is Irving Fisher’s: 

2pogo 

The />’s refer to prices for the base year and for another given 
year, and the <7’ s refer to the quantities sold at the given price in 
the base year and in the other year considered. Obviously this ideal 
formula is in the old tradition of index numbers as measures of 
prices. The emphasis Fisher places upon index numbers as meas- 
ures of prices shows his leaning toward the older conception of 
index numbers, though he distinctly states that index numbers 
may be used for other purposes. Nevertheless, he draws his illus- 
trations for the numerous formulas from the field of prices. As 
long as prices furnished the data for index numbers, it was rea- 
sonable that a search should be made for a formula which would 
be “best” under all circumstances for handling this class of data. 
But when indexes of physical production, of dependency, of em- 
ployment, of church attendance, etc., began to appear, the pur- 
pose of index numbers had so changed that it became apparent 
that the purpose of an index number, even when it deals with 
prices, should determine the formula. 

The latter is the contention of Professor King. He points out 
that index numbers are means of answering specific questions about 
data. “. . . the nature of the question asked determines absolutely 
the mathematical procedure which must be used in arriving at 
the answer, in other words, no essential change in the method of 
solution is permissible except when the question to be answered 
changes.” 14 In order to make King’s position clear, as it applies 
to the data for allowance cases used above, we may restate two of 
his questions so that they apply to our data: 

1. Considering that the work of each agency is of equal im- 
portance, what was the average ratio of allowances in 1928 
to allowances in 1927? 

2. How would the total amount of allowances in 1928 com- 
pare with the total allowances in 1927, if the same number 
of allowance cases had been handled in the two years. 

The first question is answered by finding for each agency 
the ratio of the amount of allowances in 1928 to that in 
14 Op. cit., p. 2 6. See also pp. 51-56. 


separately 
1927 and 




272 


SOCIAL STATISTICS 


finding the average of the ratios for 1928. That is a simple arith- 
metic average of relatives unweighted. The answer to the second 
question is found by multiplying the mean allowance in each case 
for each agency in both years by the number of allowance cases for 
the agency in 1927. The products are added for each year, and 
then the ratio of the sum for 1928 to the sum for 1927 is found. 
This is the method of weighted aggregates. The two questions are 
different, and the answers are different. 15 Whenever an index 
number is required for a group of data, the first question to be 
asked is, not what formula to use, but what purpose the index 
number is to serve. When that is answered, the formula, or 
mathematical procedure, will be determined. As suggested above, 
various questions may be asked about the same data, and a differ- 
ent mathematical procedure is required to answer each. There is, 
then, no serious question of whether one formula is fer se more 
accurate than another ; the formula is correct if it answers the 
question asked. 

The index numbers derived for the data on allowances by vari- 
ous methods differ more or less. These differences are shown in 
Table LX: 


TABLE LX 


Comparison of Indexes for Allowance Cases Computed by Different Methods 


Year 

Sum of 
Aggre- 
gates 

Unweighted 

(1) 

Average 

of 

Relatives 

Unweighted 

(2) 

Sum of 
Weighted 
Aggre- 
gates 
( 3 ) 

Average 

of 

Relatives 

Weighted 

(4) 

Geometric 
Average of 
Relatives 
Unweighted 

1927 

100.0 

100.0 

100. 0 

100.0 

100.0 

1928 

101.2 

101 .0 

103.8 

103.6 

100.7 

1929 

104.8 

104.8 

106.9 

106.8 

104.6 

1930 

•• 107-7 

107.6 

109.2 

109. 1 

107.6 


The weighted indexes are somewhat higher than the unweighted 
indexes in each year above the base year, though the differences 
are not large. The differences between the unweighted index 
numbers is slight, and likewise the difference between the weighted 
index numbers. ‘The unweighted index computed by the method 
of the geometric average is slightly smaller than either of the 
other unweighted indexes except for 1930, when it is the same 
as the index in column (2). True to one decimal place, the indexes 
“ Ibid. 



STATISTICAL ANALYSIS 273 

for 1939 * n co ^ umns ( 2 ) an d (5) are the same, but, if carried to 
two decimal places, the one based upon the geometric average is 
slightly smaller. The geometric average minimizes extremes, and 
the effect is to give an index slightly smaller than other methods 
which utilize the arithmetic average or any other average except 
the harmonic mean. The use of the harmonic mean gives the low- 
est index of any of the averages. The differences between indexes 
computed by the above five methods will not always be as slight 
as they appear here. Consequently, the student should not con- 
clude that it is a matter of indifference as to which one he uses. 
The one he selects for his use will depend upon what question he 
seeks to answer about his data. 

A number of “tests” for the validity of index numbers have 
been proposed, but none has been entirely satisfactory, and King, 
as indicated above, maintains that as tests they are without merit — 
e.g., circular, factor-reversal, commodity-reversal, and time-reversal 
test. He would rest the validity of a formula upon the question 
of whether or not, when applied to the data, it answers the question 
asked. Truman L. Kelley has proposed the following tests for 
validity: the smallness of the probable error of the sample used, 
whether or not the results parallel habitual modes of thinking of 
the problem, proportionality of the index to the relatives, ease 
of entering or withdrawing items from the list of quantities used, 
ease of change of base period, and ease of change of unit of meas- 
urement in the. list. On the basis of these tests Kelley finds that 
index numbers computed on the basis of the weighted geometric 
mean or the weighted median are the most reliable. 10 The so-called 
ideal formula proposed by Professor Fisher 17 requires complete 
data for its use; these are rarely obtainable. The principle laid 
down by King that the purpose of the index number determines 
the mathematical procedure seems to be as sound as any yet 
brought forward. Much more difficult to determine than the for- 
mula are the representativeness and adequacy of the sample of 
data. If these can be obtained and the purpose of the investigator 
is clearly stated, the formula is easily found. 

6. EXERCISES 

1. The following table gives the cost of maintenance of state in- 
stitutions in Indiana from 1900 to 1930 inclusive: 

18 Op . cit,, pp. 341-347- 

17 See above, p. 318. 



274 


SOCIAL STATISTICS 
TABLE LXI 

Cost of Maintenance of State Institutions in Indiana, 1900- 
1930, in Actual Dollars 1 


Year 

Cost 

Year 

Cost 

1900 

.. $1,290,790 

1916 

. . >2,794,867 

1901 

•• 1,379,860 

1917 

•• 3.016,533 

1902 

•• 1.382,397 

1918 

. . 3 , 228 , 806 

1903 

•• 1,425,753 

1919 

.. 3,306,288 

1904 

•• i,525,74i 

1920 

.. 3.748,893 

1905 

.. 1.555,787 

.. 1,620,454 

1921 

.. 4,026,403 

1906 

1922 

.. 4,049,277 

1907 

.. 1,540,985 

*923 

.. 4,173,881 

1908 

. . 1 , 800 , 470 

*924 

•• 4,154,984 

1909 

.. 1,932,381 

1925 

.. 4,600,119 

1910 

.. 1,991,005 

1926 

.. 4,544,566 

1911 

.. 2,109,833 

1 9 2 7 

.. 4,765,332 

1912 

.. 2,282,191 

1928 

.. 5,060,151 

1913 

.. 2,318,348 

1929 

.. 5,145,641 

1914 

1915 

.. 2,445,017 

.. 2,614,937 

1930 

5,392,771 


1 Indiana Bulletin of Charities and Corrections , July 1931, p. 353. 


(a) Use an index of the general price level, such as that of 
the New York Federal Reserve Bank, and adjust the 
actual expenditures to comparable dollars. The Federal 
Reserve Index of the General Price Level can be found 
in the statistical reports of the Standard Trade and Secu- 
rities Service. 

(b) Make a graph showing the curves of actual dollars ex- 
pended and the adjusted dollars expended. 

(c) What was the percentage increase in expenditures between 
1900 and 1930 in actual dollars? What was the percentage 
increase in expenditures in adjusted dollars? 

(d) The population of Indiana at the time of the census from 
1900 to 1930 was as follows: 1900, 2,51 6,462 j 19 10, 
2,700,87 6 , 1920, 2,930,390j 1930, 3,238,503. What was 
the per capita expenditure in each of these decennial years 
in actual dollars and in adjusted dollars? Has there been 
a marked increase in expenditures for the maintenance of 
state institutions during this period? 

(e) What inferences might be drawn from the foregoing 
analysis of expenditures for maintenance of state institu- 
tions regarding the frequent complaints about the rising 
tax rate? 

2. The following table gives the number of mental patients in 



STATISTICAL ANALYSIS 275 

state hospitals of the United States at a number of different 
times between 1880 and 1928: 

TABLE LXII 

Number of Mental Patients in State Hospitals in the 
United States in Specified Years 1 


Year Patients 


1880 31.973 

1890 67,754 

1904 129,222 

1910 159,096 

1922 222,406 

1923 229,664 

1926 r 246,486 

1927 256,858 

1928 264,226 


1 Mental Patients in State Hospitals , United States Bureau 
of the Census, 1 930, p. 6. 

(a) Compute the number of mental patients in each year per 
100,000 population of continental United States. The 
population will have to be estimated for intercensal years. 

(b) Construct index numbers for the rates of mental patients 
per 100,000 population. 

(c) Construct index numbers for the total patients each year 
without regard to changing population of the United 
States. 

(d) Why do these two types of index numbers differ? What 
sort of question is answered by the one based upon rates? 
What sort of question does the other answer? Does the 
principle of weighting enter into either of these index 
numbers? 

The next table gives the number of persons under care and the 

total cost of maintenance each year for the principal public 

welfare activities of the State of Indiana for a ten-year period, 
1920 to 1929: 

(a) Compute an index number for public welfare work in 
Indiana by each of the five methods described in this chap- 
ter. Devise a weighting system that will give due impor- 
tance to the different types of public welfare work. 

(b) In order to allow for changing population and changing 
purchasing power of the dollar it will be necessary to 
express the number of persons aided as the number per 
100,000 population and to deflate the actual costs with an 


276 SOCIAL STATISTICS 

TABLE LXIII 


Persons under Care and Cost of Maintenance of the Principal Public Welfare 
Agencies and Institutions in Indiana, 1920-1929 1 



State Institutions 

Poor Asylums 

Dependent 

Children 

Outdoor Relief 

Year 

Per- 

Cost 

Per- 

Cost 

Per- 

Cost 

Per- 

Cost 


sons 


sons 


sons 

sons 


1920 

11,505 

#3,748,893 

3,087 

#1,085,349 

4,462 

# 464,822 

44,253 

# 417,230 

1921 

12,529 

4,026,403 

3,271 

1,025,364 

4,450 

587,076 

79,992 

610,354 

1922 

12,937 

4,029,277 

3,365 

1,021,941 

4,487 

612,628 

94,850 

74i,i74 

1923 

12,913 

4,173,881 

3,294 

1,186,232 

4,479 

644,511 

51,256 

524,298 

1924 

13.949 

4,154,984 

3,3oi 

1,113,469 

5,456 

733,897 

71,725 

618,902 

1925 

15,016 

4,600,119 

3,433 

1,065,191 

6,021 

794,424 

74,945 

840,573 

1926 

15.769 

4,544,566 

3,535 

1,197,831 

6,367 

776,611 

93,302 

972,082 

1927 

16,567 

4,765,332 

3,671 

1,252,816 

6,365 

1,031,347 

111,659 

1,103,590 

1928 

17,211 

5,060,151 

3,969 

i,353,o8i 

6,984 

917,317 

126,711 

1,274,674 

1929 

17,477 

5, 145,641 

4,156 

1,324,797 

6,960 

1,049,160 

137,762 

1,445,758 


1 Op. cit pp. 352, 353, 459. Persons and cost for outdoor relief estimated for 1926 
and 1928. 


index of the general price level. The population of Indiana 
in 1920 was 2,930,390; in 1930, 3,238,503. 

(c) Show the five index numbers graphically on the same pa- 
per for purposes of comparison. Use the natural scale. 

(d) What question is answered by the indexes based upon un- 
weighted aggregates and upon unweighted relatives? By 
the unweighted index based upon the geometric mean? By 
the weighted indexes? From the point of view of public 
welfare, which question do you regard as the most im- 
portant? 


7. REFERENCES 

Chaddock, R. E., Principles and Methods of Statistics , Chap. X. 
Douglas, Paul H., Real Wages in the United States , 1890-1926 , 
Chaps. IV, XIII, XXVIII, XIX. 

Fisher, Irving, The Making of Index Numbers , Chaps. I-III. 
Hurlin, Ralph G., “Indexes of Family Case Work Loads,” Sur- 
vey, February 15, 1928. 

Kelley, Truman L., Statistical Method, Chap. XIII. 

King, Willford I., Index Numbers Elucidated. 

Mills, Frederick C., Statistical Methods, Chaps. VI, IX. 

White, R. Clyde, “Indexes of Public Welfare in Indiana,” Social 
Forces, December, 1929. 



CHAPTER XI 


Measurement of Relationships 


I. THE CONCEPT OF CORRELATION 

Up to this point interest has centered in the description and analy- 
sis of a single series of data. A collection of social data presents a 
chaotic picture, until it is organized as an array or a frequency 
distribution. Something more is known about the data when an 
average is computed, and still more is known when the variation 
of individual items from the average is found. The method of 
index numbers makes possible a comparison of the magnitude of 
variables at different times or localities. Measures of central tend- 
ency and of dispersion have defined more precisely the frequency 
distribution, but they have given us no conception of the relation- 
ship between two or more series of social data. Sorokin has defined 
sociology in these words: “It seems to be the study, first, of the 
relationship and correlations between various classes of social phe- 
nomena (correlations between economic and religious; family and 
moral; juridical and economic; mobility and political phenomena 
and so on) ; second, that between social and non-social (geographi- 
cal, biological, etc.) phenomena; third, the study of the general 
characteristics common to all classes of social phenomena .” 1 If 
this concept of sociology is accepted, it is obvious that the social 
statistician is especially interested in the interrelationships of social 
phenomena. It is no less true of social work than of sociology; 
the central interest of the social worker is in the relations of differ- 
ent social factors to the condition or situation with which he deals. 
At this point, then, in the study of statistical methods it is appro- 
priate to introduce ways of measuring relationships. 

The study of relations is not peculiar to the social sciences. Rela- 
tion is the paramount fact of all science. For example, the freezing 

1 Sorokin, Pitirim A., Contemporary Sociological Theories, pp. 7 **- Ncw 
York: Harpers, 1928. 


277 



278 


SOCIAL STATISTICS 


point of water at 0° Centigrade is a measure of the relation be- 
tween the condition of water and temperature at sea level. The 
symbol, H 2 O, indicates the relation existing between definite quan- 
tities of hydrogen and oxygen under specified conditions. The so- 
called laws of physics, chemistry, and biology are statements of 
relationships. In view of this fact, it is less surprising that relation- 
ships in the social sciences should be regarded as paramount and 
that ways of measuring these relationships should occupy much 
of the attention of social scientists. 

The traditional conception of cause is not used much in statis- 
tics. Cause-and-effect have had a history too closely connected with 
the older metaphysics to make them of use in the social sciences, 
unless the concept be redefined. The measurement of relations 
by statistical methods is the modern substitute for the metaphysical 
concept of cause-and-effect. Instead of speaking of one fact as a 
cause and another as an effect of the first, it is the habit to speak of 
one fact as the independent variable and of the second as the de- 
pendent variable . In some cases the dependent variable might just 
as well be treated as the independent variable. In other cases a 
certain amount of change in the independent variable is followed 
by a definite amount of change in the dependent variable. That 
approaches the traditional conception of cause-and-effect. Correla- 
tion is a method of measuring the degree of simultaneous variation 
existing between the averages and dispersions of a dependent 
variable and one or more independent variables. It may be a 
measure of cause-and-effect analogous to the traditional usage, but 
it is not necessarily so. If a change in one fact is so closely asso- 
ciated with change in another that the second may be predicted 
from the first, there is obvious interdependence which might be 
called a cause-and-effect relation, but, if so denominated, it should 
be clear that the relations are conceived in mechanistic terms as 
reactions to stimuli or forces. However, two facts may vary simul- 
taneously and still not be related as independent and dependent 
variables. For example, the number of the population having ton- 
sils removed may increase at the same time that the number of 
automobiles increases. The trends of the two series of facts might 
be correlated mathematically and an apparently significant coeffi- 
cient of correlation found, but no one would assert that a cause- 
and-effect relation exists between the two sets of phenomena. Some 
understanding of the relation, if any, of two such kinds of phe- 
nomena is fundamental before inferences can be made regarding 



STATISTICAL ANALYSIS 


279 


cause-and-effect on the basis of statistical correlation. If there are 
good grounds for believing that two sets of phenomena vary inter- 
dependently, the technique of correlation may be employed to 
measure the degree of such interdependence. To state the matter 
another way: the discovery of a significant degree of correlation 
either confirms an hypothesis of interdependence or it suggests an 
hypothesis of interdependence requiring further consideration by 
other methods of analysis. 

The technique of correlation is of particular importance in the 
study of social problems, because usually there is some social 
advantage to be achieved by obtaining control over the conditions 
among which social problems arise. If the aim is to reduce mor- 
tality in a certain area of a city, then the factors which contribute 
to a high mortality rate must be determined. Mortality here 
would be the dependent variable and the other factors the inde- 
pendent variables. In the first place, the independent variables 
with respect to mortality have to be identified. Then arises the 
question as to their relative importance as “causes” of mortality. 
This can be answered by the correlation technique in so far as 
covariation may be assumed to represent interdependence. Corre- 
latedness there is between two series of social data, such as the inter- 
relatedness of physical production and volume of employment or 
the age distribution of the population and the per cent of the 
population married. It should always be kept in mind that “cor- 
relation” in statistical discussion refers only to the degree of 
relations among numerical variables . If qualitative data are to 
be analyzed by the correlation technique, then they must be re- 
duced to quasi-quantitative terms by the use of a rating scale. This 
caution has more than ordinary weight in social statistics, because 
so many social facts thought to be of great importance are quali- 
tative. Correlation technique is none the less important in the 
study of social problems, but great care is necessary in its applica- 
tion to specific data. 

2. THE MEANING OF FUNCTION 2 

It is customary to speak of the independent variable as X and 
the dependent variable as Y . Or sometimes the independent va- 

2 For much of the detailed procedure which follows in this chapter the author 
is indebted to Ezekiel’s Methods of Correlation Analysis , John Wiley & Sons, 
New York, 1930, the most comprehensive volume yet published on this subject. 



280 


SOCIAL STATISTICS 


riable is designated Xi and the dependent variable as X2. If there 
are two or more independent variables the X’s are given appropri- 
ate subscripts to indicate the variable to which reference is made . 3 
The Y variable is also known as a junction of the X variable. All 
that this means is that Y is dependent upon X — that a variation 
in X is followed by a corresponding variation in Y . In a loose way, 
the variation in X might be called the cause of the variation in Y. 
When one variable is said to be the function of a second variable, 
it simply means that a variation in the second accompanies a varia- 
tion in the first — that is, a variation in X accompanies a variation 
in Y. This mathematical language is precise in its meaning, and 
expresses a complex situation in a few words. 

Table LXIV and Figure LIV illustrate the concept of function 
by means of data taken from physics. If a body moves at a uniform 
velocity, the distance traveled at any given second is equal to the 
product of the velocity and the time in seconds: 

TABLE LXIV 

Distance of a Body from the Starting Point, if It 
Moves at the Rate of 5 Feet per Second, at Specified 
Seconds 


1 

2 

3 

4 

5 

6 

7 

8 


Time in Seconds 


Distance in Feet from 
; Point 

5 

10 

15 

20 

25 

30 

35 

40 


The diagonal line connecting the dots gives a picture of the dis- 
tance traveled by the moving body at any specified second or frac- 
tion of a second. It expresses graphically the relation between time 
and distance. Such a diagram is the simplest way of indicating the 
functional relation of an independent and a dependent variable. 
It is obvious that a change in the X variable is accompanied by a 
definite change in the Y variable. We know from physics that 

d = vt 

8 In this text the methods of multiple correlation, partial correlation, and part 
correlation are not presented. They are properly considered in an advanced 
course in statistics. 



Y«TIME IN SECONDS 



282 


SOCIAL STATISTICS 


in which d is distance, v velocity, and t time in seconds. Or, putting 
the formula in terms of X and 7 , 



v 


But suppose we think of the functional relation in terms of the 
diagonal straight line. This line of relationship may be expressed 
by an equation as well as by a graph. The general equation for a 
straight line is: 


If the diagonal line had intersected OY above O, we should have 
had a small piece of OY below this intersection. That small piece 
of OY would be a in the above equation. The vertical distance b , 
indicated on the diagram, does not represent one second or a space 
on the diagram; it represents the ratio of seconds to feet traveled 
at any given second. The value of b must be computed. Since we 
know the value of a to be O, that will be easy to do by the method 
of simultaneous equations. It will be necessary to assume some 
value for X. It makes no difference what values are assumed for 
X\ so for convenience we shall assume that the values of X at 
different times are 5 and 10. Since the velocity of the body is 5 
feet per second, the corresponding values of Y will be 1 and 2. 
The equations may then be written as follows: 

a + b(\ 6 ) = 2 
* + b(s) = I 

Or a + 10b = 2 

— a — • ^b — — 1 

5 * 

5 

b — .2 

To solve the simultaneous equations, we assume the signs of the 
lower equation to be changed and then add algebraically, which 
cancels out the a’ s andTeaves b as the only unknown. The value of 
b is found to be .2. Substituting the values of a and b in the equa- 
tion for a straight line, we get: 

Y - o + .iX 



STATISTICAL ANALYSIS 283 

This is the equation of the diagonal line in the graph. It expresses 
the specific relation of X and Y . Every straight line relation be- 
tween two series of data has a specific equation which may be 
found in the same manner as the above. Because the dots repre- 
senting the intersections of ordinates and abscissas lay on a straight 
line, the equation makes exact estimates of unknown Y’s possible. 
But many series of data, when plotted, are only approximately 
represented by a straight line. Then the specific equation which 
best “fits” the distribution is not so exact, and estimates made from 
it are only approximate. Such approximate equations are commonly 
found when we are dealing with social data. 

As an example from social statistics, the relation between rates 
of misdemeanants and of felons in twenty census tract areas of 
Indianapolis will be used. Table LXV gives the data: 


TABLE LXV 

Misdemeanant and Felon Rates by Census Tracts, 
Indianapolis 


Misdemeanant Rates Felon Rates 

X Y 


2.2 1.3 

3- 8 1. 1 

4.1 1 . 1 

4- i 2.9 

5- 2 , 1.3 

7-2 1.7 

7.5 2.6 

7-7 2.4 

7- 9 2.7 

8- 4 ' 4-2 

9.o J.i 

9- i 4-0 

9-7 2.9 

10.4 4.2 

1 1 . 1 4.6 

13-5 3-6 

LI-8 5-7 

15-5 9-3 

19-3 10. 1 

23-4 9-6 


The relation between these two series of data is represented best 
by a straight line. Figure LV shows the data as a scattergram. 
There is considerable scatter among the dots, but they lie above 
and below a straight line which might be drawn through them. 




X= MISDEMEANANT RATES 

Figure LV. — Misdemeanant and Felon Rates 





STATISTICAL ANALYSIS 


285 

The specific equation for the straight line must be found but, 
before computing the equation for the line of best fit, a line may 
be drawn free-hand which approximates the equation of the line. 
This is done by taking the mean rates for misdemeanants and 
felons in each class-interval and plotting them. The irregular line 
drawn through the dots is this line of means. It merely indicates 
a little more clearly the general relation between the two variables, 
felon and misdemeanant rates. 

But another and more accurate method is required. The method 
commonly used is known as the method of least squares. The 
whole problem of fitting the line lies in determining the constants, 
a and b y in the equation for the straight line. The method seems 
to be a little complicated, but experience In using it soon dispels 
its apparently formidable character. When the student learns to 
use the method of least squares to fit a curve, he has learned a 
good deal of the procedure in the computation of coefficients and 
indexes of correlation. Table LXVI shows the computations neces- 
sary to determine the constants in the equation of the straight 
line. 


TABLE LXVI 


Computation of Values (Misdemeanant and Felon Rates) for Determining 
the Line of Least Squares 


Misdemeanant Rates 

X 

Felon Rates 

Y 

X 2 

XY 

2.2 

i -3 

4.84 

2.86 

3.8 

1 . 1 

14.44 

4.18 

4-1 

1 . 1 

16.81 

4 . 5 i 

4 -i 

2-9 

16.81 

11.89 

5-2 

i -3 

27.04 

6.76 

7.2 

i -7 

51-84 

12.24 

7-5 

2.6 

56.25 

19.50 

7-7 

2.4 

59.29 

1 8 . 48 

7-9 

2-7 

62.41 

21 -33 

8-4 

4-2 

70.56 

35-28 

9.0 

5 -i 

81 .00 

45.90 

9 -i 

4.0 

82.81 

36.40 

9-7 

2.9 

94.09 

28.13 

10.4 

4-2 

108.16 

43.68 

11 . 1 

4.6 

123.21 

51.06 

13-5 

3-6 

182.25 

48.60 

13-8 

5.7 

190.44 

78.66 

15-5 

9-3 

240.25 

144-15 

19-3 

10. 1 

372.49 

194-93 

23-4 

9.6 

547-56 

224 . 64 


80.4 2,402.55 1,033.18 

4.0 


Totals 

Means 


192.9 

q.6 



286 


SOCIAL STATISTICS 


The following are the formulas for obtaining the values of a 
and b : 


2ZY- nM x M y 
a = M v — bM x 

In these equations M x and M v stand for the means of X and Y 
respectively, and n is the number of items. Substituting in these 
equations the values found from the table, we have: 

- 768.00 

= .47 

2,402.55 - 1,842.60 
*-4.0-4.51 = - .51 

Putting these values in the place of the symbols, we have: 


This is the equation of the straight line which fits the data on mis- 
demeanants and felons and which describes the relation between 


TABLE LXVII 

Values of Y Estimated from Values of X and the Difference between the 
Actual and the Estimated Values 


Misdemeanant 

Rates 

X 

Felon 

Rates 

Y 

Values of 

Y Estimated from 
y = -.51 + -47 AT 

Residuals 
(X - Y ') 

Residuals 

Squared 

2.2 

1 .3 

■5 

.8 

.64 

3.8 

1 . 1 

1 -3 

— .2 

.04 


1 . 1 

1 -4 

~ -3 

.09 

41 

2.9 

1-4 


2.25 

5-2 

1 -3 

1-9 

— .6 

•36 

7.2 

i -7 

2-9 

— 1 .2 

1.44 

7-5 

2.6 

3-0 

- .4 

. 16 

7-7 

2.4 

3 -i 

- -7 

■49 

7-9 

2.7 

3-2 

- .5 

•25 

8.4 

4-2 

3-4 

.8 

.64 

9.0 

5 -i 

3.7 

1-4 

1 .96 

9 .i 

4.0 

3.8 

.2 

.04 

9-7 

2.9 

4-1 

— 1 .2 

1 .44 

10.4 

4-2 

4-4 

— .2 

.04 

11 . 1 

4.6 

4-7 

— . 1 

.01 

135 

3-6 

5.8 

— 2.2 

4.84 

13.8 

* 5-7 

6.0 

- 3 

.09 

15.5 

9-3 

6.8 

2-5 

6.25 

19-3 

10. 1 

8.6 


2.25 

23.4 

9.6 

10.5 

- *9 

.81 

Totals 192.9 

00 

0 

4 * 

80.5 


24.09 



STATISTICAL ANALYSIS 


287 


the two series of data. From it we can estimate the value of Y by 
assuming any value of X falling within the limits of the actual 
X% that is, from 2.2 to 234. That the equation is not reliable for 
values of X outside the limits of the actual data is shown by the 
fact that, if we assume X to be 1, then the value of Y« is — .04. 
It is nonsense to think of having less than o felons. 4 

Using the equation computed, we have estimated a value of Y 
for each value of X. The estimated values of Y are designated 
Y' (Table LXVII). 

The total of the estimated values is close to the total of the actual 
values of Y. Thirteen of the estimated values are too large and 7 
are too small. The straight line is, therefore, not a very close fit. 
However, it illustrates the method of fitting a straight line to data. 
If we had a larger number of cases, the fit might more evenly 
divide the minus and plus differences. 

The residuals, or differences between the actual values of Y 
and the estimated values of Y', have been squared so that the 
standard error of estimate could be computed. The formula for 
the standard error of estimate is: 


= 


2z 2 


n — 1 


Substituting the values already obtained in this equation, we have: 




_ ^ 4-09 


20 


Extracting the square root of both sides of the equation, 

S v . x = 1. 1 

The chances are approximately 2 to 1 that any estimate of the 
value of Y from the equation will not vary more than 1.1 above 
or below the actual Y value. The subscript of S indicates that Y 
is estimated from values of X $ any value for X may be assumed, 
provided it is neither smaller nor larger than actual values of X 
given in the table. 

The fitted straight line is shown in Figure LVI in relation to 
the actual distribution of data. The broken lines drawn parallel 
to the solid line represent the limits of the standard error of esti- 
mate above and below the estimated values represented by the 
solid line. The chances are 2 to 1 that any actual Y will fall be- 
tween the broken lines. 

4 See Ezekiel, op. cit p. 60. 




X= MISDEMEANANT RATES 

Figure LVI. — Straight Line Fitted to Misdemeanant and Felon Rates 


STATISTICAL ANALYSIS 


289 




c. Semi-Logarithmic Curve, 
Y=a+b log X 


d. Logarithmic Curve, 
log Y=a-fb logX 



e. Parabola, 
Y=a+bX+cX 2 

Figure LVII. — Types of Standard 


f. Hyperbola, 

Y= _I_ 

Y a+bX 

Curves with the Formula for Each 



SOCIAL STATISTICS 


290 

The foregoing discussion related only to straight line relation- 
ships. But in the study of social statistics it will often be found that 
the relation between two variables is not linear, but curvilinear. 
The distribution may take the form of a parabola, a hyperbola, or 
a logarithmic curve. In such cases the fitted curve is computed by 
methods differing considerably from that used in fitting a straight 
line. Six common types of curves are shown in Figure LVII. 
It will be noticed that the formulas for curves involving the use 
of logarithms are identical with the formula for the straight line, 
except that the logarithm of X or Y or of both is used. Likewise 
the formula for the hyperbola in the lower right corner resembles 
the straight line formula, except that in the case of the hyperbola 
Y is equal to the reciprocal of a + bX. It is obvious that the fitting 
of a hyperbola is done in the same manner as fitting a straight line, 
and then the reciprocal for a + bX is found, which is the value of 
7. The fitting of a simple parabola involves the computation of a 

TABLE LXVIII 

Per Cent of Land Used for Business Purposes and Felon Rate with Compu- 
tations 


Per Cent of Felon Logarithm 

Land Rate of X 





X 


y x 


7 -i 

2.6 

.851 

.7242 

2.2126 


6.8 

1-7 

•833 

•6939 

1 .4161 


4.6 

2.7 

.663 

•4396 

1. 7901 


7-5 

2.9 

.875 

.7656 

2.5275 


16.3 

5-7 

1 .212 

I .4689 

6.9084 


5 - 1 

2.1 

.708 

•5013 

1.4868 


5 -3 

1.6 

■724 

.5242 

1.1584 


16.6 

3-6 

1 .220 

I.4884 

4.3920 


13-3 

5 - 1 

1 . 124 

1.2634 

5.7324 


9.8 

4-2 

.991 

.9821 

4.1622 


9.0 

2-5 

•954 

.9IOI 

2.3850 


10.6 

i -3 

1 .025 

I . 0506 

1-3325 


15-9 

1 . 1 

1 .201 

I.4424 

1.3211 


20.0 

4-9 

1 .301 

I . 6926 

6.3749 

. 

* 3 -3 

4.0 

1-367 

I.8687 

5.4480 


23.1 

13-8 

1.364 

I.8605 

18.8232 


22.0 

4-4 

1 -342 

T .8010 

5.9048 


24-3 

* 7-9 

1 .386 

I .9210 

10.9494 


33-9 

2.8 

i- 53 ° 

2.3409 

4 . 2840 


34-0 

3-3 

1 - 53 i 

2.344O 

5.0523 

Totals 

Means 

* 

OO 

0 

CO 

78.2 

3-9 

22 . 202 

I . I IO 

26 . 0834 

93.6817 




S31VU 


Figure LVIII. — Felony Data with Fitted Curve and Limits of Error of Estimate 


igi 


SOCIAL STATISTICS 


third constant, c y and of X 2 . In view of the fact that each of the 
logarithmic formulas is used in a similar manner, only one will be 
illustrated. A simple parabola will be fitted to the same data to 
illustrate the use of this formula. 

The data used for the illustration are the per cent of land used 
for business purposes and the felon rate in certain census tract 
areas in Indianapolis. Table LXVIII gives the data and the 
necessary computations for fitting a logarithmic curve of the form 
Y = a + b log X. 

The first step is to find the values of the constants, a and b y and 
the formula is similar to that used in finding the constants for the 
straight line equation: 


b = SY A - nM y M x 

sF - w(M x ) 2 

a - M y — b Mi 

The bar over X or Y indicates that the logarithm is used instead 
of the actual figures. Substituting in the above equations, we have: 

b = 93-6817 - 86.5800 = 

26.0834 — 24.6420 
* = 3- 9 “ 4-93(i-i 10) = —i-57 

The equation of the particular curve which fits the data is, there- 
fore: 


Y= -1.57 + 4.9 3X 

The equation is expressed in terms of Y. So, in using this equation 
for purposes of estimation it is not necessary to convert the values 
of Y to the anti-logarithms, or natural numbers; the values ob- 
tained will be actual values of Y. 

Table LXIX gives the estimated values for Y and the residuals, 
that is, the differences between actual values and estimated values 
of Y. 

Figure LVIII shows the distribution of the data, the fitted curve, 
and the limits of the standard error of estimate. The formula for 
the standard error of^estimate for curvilinear distributions is simi- 
lar to that for linear distributions. It is: 


n — 



STATISTICAL ANALYSIS 
TABLE LXIX 

Actual Values of Y, Estimated Values of Y, and the Residuals 


293 


Y 

Y' 

Y- Y', 
or z 

z 2 

2.6 

2.6 

0 

0 

1.7 

2-5 

- .8 

.64 

2.7 

i -7 

1 .0 

1 .00 

2.9 

2-7 

.2 

.04 

5-7 

4-4 

i -3 

1 .69 

2. 1 

1-9 

.2 

.04 

1 .6 

2.0 

- -4 

.16 

3-6 

4-4 

- .6 

36 

5 -i 

4.0 

1 . 1 

1 .21 

4.2 

3-5 

•7 

•49 

2-5 

3-1 

- .6 

•36 

i -3 

3-5 

— 2.2 - 

4.84 

1 . 1 

4.4 

-33 

10.89 

4 9 

4.8 

. 1 

.01 

4 ° 

5-1 

— i . 1 

1 .21 

138 

5 -i 

8-7 

75.69 

4-4 

5 -o 

- .6 

■36 

7-9 

5-3 

2 . 6 

6.76 

2.8 

6.0 

3 ■ 2 

IO.24 

3-3 

6.0 

2.7 

7.29 

78.2 

78.0 


123.28 


The standard error of estimate is 2.5. That is large. If the one 
extreme variation is left out, the standard error of estimate for 19 
of the items is 1.5. Although the curve is not a very good fit, the 
logarithmic curve of this type would probably fit the data more 
closely if a larger number of items were included. The method of 
computation is the same, regardless of the closeness of fit. 

While these data on crime are not distributed in the form of a 
simple parabola, we shall use them for purposes of illustrating 
the method of fitting a parabola. 5 As in the previous problem, the 
chief task is to compute the constants a, b } and c in the formula 

Y= a + bX + cX 2 

The a and b values will differ from their values in the logarithmic 
formula, unless c is zero. To determine the constants the following 
equations must be solved: 


a = M y — bM x — cM u 

5 See Ezekiel, op. cit., pp. 72-78. Also F. C. Mills, op. cit ., pp. 284-290, 300-306. 



294 


SOCIAL STATISTICS 


To find the values necessary for the solution of these equations a 
table similar to the one used for the values for the logarithmic 
formula can be used. For convenience U may be used for X 2 in 
certain combinations which will be shown in the table. The method 
of determining the values of the above equations is as follows: 

M x = 


M y =- 
n 

= S^ 2 - n(M x y 


- nM x M y 

— nM u M y 


TABLE LXX 

Computation of Values for Fitting a Simple Parabola — Crime Data 


Per Cent 
of Land 

X 

Felony 

Rate 

Y 

X 2 

or 

U 

XU 

U 2 

AF 

UY 

7 -i 

2.6 

50.41 

357-91 

2541.17 

18.46 

131.07 

6.8 

i -7 

46.24 

3 H -43 

2138.14 

11.56 

78.61 

4.6 

2.7 

21 . 16 

97-34 

447-75 

12.42 

57-13 

7-5 

2.9 

56.25 

421 . 86 

3164.06 

21.75 

163- 13 

16.3 

5-7 

265 . 69 

4330.75 

70596.49 

92.91 

I 5 I 3-43 

5 -i 

2.1 

26.01 

132 . 66 

676.52 

10.71 

54.62 

5-3 

1.6 

28.09 

148.88 

789.05 

8.48 

44-94 

16.6 

3-6 

275 . 56 

4574 - 3 ° 

75955-36 

5976 

992.02 

13-3 

5 -i 

176.89 

2352.64 

3493161 

67.83 

902. 14 

9.8 

4.2 

96.04 

941.19 

9223.68 

41 . 16 

403-37 

9.0 

2-5 

81 .00 

729.00 

6561 .00 

22.50 

202 . 50 

10.6 

1 -3 

112.36 

I I 91 .02 

12633.76 

13-78 

I46.07 

15-9 

1 . 1 

252.81 

4OI9.7O 

63907.84 

16.49 

278.09 

20.0 

4-9 

400.00 

8000.00 

160000.00 

98.00 

I960.OO 

23 -3 

4.0 

542.89 

12649.34 

294740.41 

93.20 

2171.56 

23.1 

13.8 

533 - 61 

12326.39 

284728.96 

318.78 

7363.82 

22.0 

4-4 

484.OO 

I0648.OO 

234256.00 

96.80 

2129.60 

24-3 

7-9 

590.49 

14348.91 

348690.25 

191.97 

4664 . 87 

33-9 

2.8 

I 149.21 

38958.22 

1 320201 .00 

94.92 

3217.79 

34-0 

3-3 

J 156.00 

39304.OO 

1336336-00 

112.20 

3814.80 

308 s 

78.2 

6344- 7i 

155799-54 

4262519.05 

1403.68 

30289.56 


M x = 15.5. 


My = 3 . 9 . 


M u «= 317.2. 



STATISTICAL ANALYSIS 


295 


Substituting the required values in the preliminary equations 
shown above, we have: 

2# 2 = 6344.71 - 4805.00 = 1539.7 1 
= *55799-54 “ 98332.00 = 57467.54 
= 4262519.05 — 2012316.80 = 2250202.25 
= 1403.68 — 1209.00 = 194.68 
= 30289.56 - 24741.60 = 5547-96 

Using these derived values in the equations to be solved simulta- 
neously, we have 

\b+ 57467.54 c = 194.68 (I) 

57467.54^ + 2250202.25c = 5547.96 (II) 

These equations are most easily solved by the Doolittle method. 
Putting down equation (I), dividing it through by the coefficient 
of b with the sign changed, and placing the derived equation (I') 
under it, we have 

* 539 - 7 ** + 57467 . 54 ^ = 194.68 (I) 

-b - 37.32361c = -0.12643 (I') 

Equation (II) is then put down, equation (I') is multiplied by 
the coefficient of c in equation (I), the result of which is placed 
under equation (I): 

57467.54^-1- 2250202.25c = 5547.96 (II) b 

— 57467.54^ — 2144896.05c = —7265.62 (I' times 57467.54) 
Adding, 1 0530 ',.2oc = — j 7 1 7.66 

c — — 0.01631 

Substituting this value of c in equation (I), we have 

i 539 - 7 i*+- 937-30 =194.68 

U 39 . 7 I* = 194-68 + 937.30 
b = .74 

The third constant may now be computed: 

a = M y — bM x — cM u 
= 3-9 - (-74) (U.5) - (-0.01631) (317.2) 

= 3.9 “ n- 3 * + 5-*7 
= — 2.25 

The equation for the parabola is, therefore: 

Y = - 2.25 + .74X - 0.01631JY 2 

With this equation the values of Y may be estimated for values 



SOCIAL STATISTICS 


296 

of X lying between the lowest and the highest actual values given 
in Table LXX. But the parabola does not fit these data, and the 
detailed estimates are not presented here. The logarithmic curve is 
a better fit. 

3. MEASUREMENT OF THE DEGREE OF RELATIONSHIP 

The methods of measuring relationship have so far shown 
whether or not a relation existed between the two series of data 
and have shown how closely the values of the dependent variable 
may be estimated from values of the independent variable. The 
first indication of relationship was determined by finding an equa- 
tion which appeared to conform to the distribution and by plotting 
a curve with the estimated values. The second result was obtained 
by computing the standard error of estimate. Neither of these 
methods gives a clear idea of the importance of the interrelation- 
ship. Another method is necessary for this purpose. 

This is the method of correlation. The degree of correlation is 
expressed as a coefficient. The degree of correlation in linear rela- 
tions, that is, relations which may best be represented by a straight 
line, may vary from — 1.0 to +1.0. If the coefficient comes out 
with a minus sign, it means that, when a change occurs in the 
independent variable, a corresponding change in the opposite 
direction occurs in the dependent variable. If the sign of the co- 
efficient is plus, then a change in the independent variable is 
accompanied by a corresponding change in the same direction in 
the dependent variable. A coefficient of plus or minus one would 
be perfect correlation. In curvilinear relations the coefficient may 
vary from o to 1.0, the latter being perfect correlation. The meas- 
ure of relationship in curvilinear relations is called an index of 
correlation and is designated by p to distinguish it from the meas- 
ure of relationship in linear relations which is called a coefficient of 
correlation and is designated by r. 

Perfect correlation is almost never found between two variables. 
A certain amount of variance in X is accompanied by a certain 
amount of variance in F, and the measure of this interdependence 
is something less than unity. Social factors are influenced by a 
variety of things, and the aim of the statistician is to measure the 
amount of influence which an independent variable has upon a 
dependent variable or the amount of influence several independent 
variables have in combination upon a dependent variable. If a 
reliable measure of such relations can be obtained, the first step 



STATISTICAL ANALYSIS 


297 


has been taken toward control in the case of the variables con- 
sidered. It cannot be said that a coefficient of correlation, if multi- 
plied by 100, indicates the percentage of variance in Y due to 
variance in X. A slightly different measure is needed for this pur- 
pose. “Where both X and Y are assumed to be built up of simple 
elements of equal variability all of which are present in Y but 
some of which are lacking in X,” says Ezekiel, “it can be proved 
mathematically that r 2 measures that proportion of all the ele- 
ments in Y which are also present in X. For that reason in cases 
where the dependent variable is known to be causally related to 
the independent variable, r 2 may be called the coefficient of deter- 
mination. It may be said to measure the percent to which variance 
in Y is determined by X, since it measures that proportion of all 
the elements of variance in Y which are also present in X.” Like- 
wise, “Where curvilinear relations have been used in determining 
the relationship, the term ( index of determination } will be used to 
denote the value of p 2 , thus retaining the same relation to the index 
of correlation that the coefficient of determination bears to r, the 
coefficient of correlation .” 0 Hence we have two measures each for 
the degree of correlation in linear relations and in curvilinear rela- 
tions, but they mean slightly different things. We shall illustrate, 
first, the method of computing the coefficient of correlation and 
the coefficient of determination for ungrouped data. 

The formula for the coefficient of correlation used here is : 7 


. = coefficient of correlation 

r = sum of the products of the two corresponding variables 
nM x M y = product of the means times the number of items 
1 ZX 2 = sum of the squares of the X-variable 
nM x 2 = the square of the mean of the X-variable times the number 
of items 

; = sum of the squares of the Y-variable 

; = the square of the mean of the Y-variable times the number 
of items 

The method of computing these values is illustrated by Table 
LXXI. 

6 Op. c\t., p. 120. [Italics mine. R. C. W.] 

7 This formula is taken from Ezekiel, op. cit., p. 127. 



298 SOCIAL STATISTICS 

TABLE LXXI 

Computation of Values for Determining the Coefficient of Correlation 


Per Cent 

Felon 




of Land 

Rate 

X 2 

Y* 

XY 

X 

Y 




2.2 

1-3 

4.84 

1 .69 

2.86 

3 - » 

1 . 1 

14.44 

1 . 21 

4.18 

4 - 1 

1 . 1 

16.81 

1 .21 

4 - 5 1 

4-1 

2.9 

16.81 

8.41 

11.89 

5-2 

i -3 

27.04 

1 .69 

6.76 

7-2 

i -7 

51.84 

2.89 

12.24 

7-5 

2.6 

56.25 

6.76 

19.50 

7-7 

2.4 

59.29 

5.76 

18.48 

7-9 

2.7 

62.41 

7.29 

21-33 

8.4 

4-2 

70.56 

17.64 

35-28 

9.0 

5 -i 

81 .00 

26.01 

45.90 

9.1 

4.0 

82.81 

16.00 

36.40 

9-7 

2.9 

94.09 

8.41 

28.13 

10.4 

4-2 

108 .16 

17.64 

43.68 

11 . 1 

4.6 

123.21 

21 . 16 

51 .06 

13.5 

3-6 

182.25 

12.96 

48.60 

13-8 

5-7 

190.44 

32.49 

78 . 66 

15 5 

9-3 

240.25 

86.49 

144.15 

19 3 

10. 1 

372-49 

102.01 

194-93 

23-4 

9.6 

5+7-56 

92 . 16 

224 . 64 

Total 192.9 

80.4 

2402.55 

469 . 88 

1033.18 

Mean 9 . 6 

4.0 





Substituting in the equation, we have: 

_ 1033-18 - 20(9. 6) (4.0) 

5 — 20(92.16)] [469.88 — 20(16.0)] 

1033.18 — 768.00 

V'(559-35) (i49- 88 ) 

_ 265.18 

~ 2 8 9-S4 
= .916 

This is the unadjusted coefficient of correlation between the resi- 
dences of misdemeanants and the residences of felons . 8 The stand- 

8 Another way to compute the coefficient of correlation is known as the 
product-moment method proposed by Karl Pearson. The formula for this 
method is 


in which x and y are the deviations from the respective means. The same kind 
of table is used for computing the values as above, except that two additional 
columns are necessary for the deviations from the means. The product-moment 
method is somewhat longer than the method used above, and the results obtained 
are almost identical. Consequently, the method used for illustration is preferable 
as a labor-saving device. 



STATISTICAL ANALYSIS 


299 

ard deviation of either X or Y may be obtained from the values 
computed in the following manner: subtract the square of the 
mean multiplied by the number of items from the sum of the 
squares of Z or 7 , divide by the number of items, and extract the 
square root. 

It was noted above that the coefficient .916 is unadjusted, that 
is, the number of items, or observations, has not been allowed for. 
That is particularly necessary when the number of items is small, 
as in the illustration. The following formula is used for adjusting 
the coefficient for the number of items: 


The expression (n — 2) stands for the number of items less the 
number of constants (#, b , c } etc.) in the equation describing the 
relation. Since the relation between misdemeanants and felons 
seems to be linear, there are two constants, because there are two 
constants in the equation of a straight line. Substituting the values 
in the equation, we have: 


r yx = .91 1 

Adjustment for the number of observations slightly reduces the 
size of the coefficient. Since the coefficient of determination is the 
square of the adjusted coefficient of correlation, we have 

r V x 2 = .830 

In linear correlation the correlation may be either positive or 
negative. If the two series of data change simultaneously in the 
same direction, the correlation is positive. Examination of the table 
above shows that, when misdemeanant rates are high, felon rates 
are also likely to be high. To indicate the direction of variation, a 
plus sign may be placed in front of the coefficient: +.91 1. If one 
series of the data had shown a decrease when the other showed an 
increase, the correlation would have been negative, and a minus 
sign would have been placed before the coefficient. The required 
sign can always be determined by inspection of the table of vari- 
ables or of a scattergram. On a scattergram if the dots are dis- 
tributed in a rising direction from left to right, the correlation is 
indicated as positive. If the dots are distributed in a falling direc- 



300 


SOCIAL STATISTICS 


tion from left to right, the correlation is negative. 9 When the 
computation of the correlation is carried through, the coefficient 
comes out with the appropriate algebraic sign affixed; from the 
scattergram the sign can be guessed in advance of computation. 

The regression equation and the standard error of estimate have 
not yet been computed. The regression equation requires the com- 
putation of the constants for the equation of a straight line, 

Y=a + bX . 

We shall compute the value of b , when Y is the dependent 
variable. 

P — nM x M y 


_ 1 033.18 - 768.00 
2402.55 - 1843.20 

265.18 , 

= — — = .476 
559-35 

= My — bM x 

= 4-0- 4-570 = -.570 


The regression equation of Y on X is, then, 

Y = -.570+ .476X 

Logically and practically, it is the estimation of Y from X that is 
desired. It is possible, however, to treat Y as the independent 
variable and X as the dependent variable in the regression equa- 
tion and compute the change in X for each unit of change in Y 
from the following variation in the preceding formula: 


Since the arithmetic involved in this formula is the same as in the 
preceding formula, the computation is not carried out. 

The formula for the standard error of estimate for the adjusted 
coefficient of correlation is as follows: 


Or, 



/ A f\C\ 

yx = 


= 1.17 

’This is true, of course, only if the A'-scale run9 from left to right and the 
Y-scale from bottom to top. Reversal of the Y-scale would reverse the direction 
of the regression line. 






302 


SOCIAL STATISTICS 


We may now make a scattergram, draw the regression line from 
estimates of Y , and insert the lines parallel to the regression line 
to show the limits of the range of the standard error of estimate. 
Figure LIX shows the regression line and the standard error of 
estimate determined from the coefficient of correlation. 

The chances are 2 to i that any estimate of the value of Y from 
a value of X will fall within the limits indicated by the parallel 
broken lines. It is worthy of note that the standard error of esti- 
mate for the fitted straight line is i.i, whereas the standard error 
of estimate determined from the coefficient of correlation is i.i 6 , 
or considerably larger than the first. The latter is more depend- 
able, because it includes a consideration of the relative importance 
of the variations of the two variables. 10 

Up to this point the discussion of correlation has dealt with the 
degree of correlation between two series of data whose relationship 
may be described by a straight line. But we have seen that some 
relations are curvilinear. The method of computing the index of 
correlation and the index of determination is somewhat different 
from the methods used to compute the coefficient of correlation 
and the coefficient of determination. For purposes of illustration 
the data on per cent of land used for business purposes by census 
tracts in Indianapolis and the felon rates by census tracts will be 
used. Referring back to Figure LVIII, it is clear that the relation 


TABLE LXXII 

Computation of Group Averages to Indicate the Form of the Regression 

Curve — Crime Data 


Per Cent 

Felon 

Per Cent 

Felon 

Per Cent 

Felon 

Per Cent 

F elon 

of Land 

Rate 

of Land 

Rate 

of Land 

Rate 

of Land 

Rate 

0-9.9 


10-19. 9 


20-29 . 9 


30 - 39 -9 


X 

Y 

X 

Y 

X 

Y 

X 

y 

4.6 

2.7 

13-3 

5 i 

20.0 

4-9 

33-9 

2.8 

7.1 

2.6 

10.6 

1 -3 

23-3 

4.0 

34-0 

3-3 

6.8 

1 -7 

16.3 

5-7 

23.1 

13-8 



7-5 

2.9 

16.6 

3-6 

22.0 

4.4 



5 -i 

2.1 

15-9 

1 . 1 

24-3 

7-9 



5-3 

1 .6 







9.8 

4-2 







9.0 

2-5 

0 






Total 55.2 

20.3 

72.7 

16.8 

112.7 

38-5 

67.9 

6.1 

Mean 6 . 9 

2.5 

14-5 

3-4 

22. 5 

7-7 

340 

3 -i 


See Ezekiel, op. cit., pp. 117, 118. 



303 



•SCATTERCRAM WITH LINE OF MEANS AND FREEHAND CURVE SUPERIMPOSED— CRIME DATA 


304 


SOCIAL STATISTICS 


between these two series is curvilinear. If we had not already 
determined the fit of a curve to these data, it would not be neces- 
sary to fit a freehand curve in order to determine the number of 
constants for the regression equation. This will be done so that the 
method may be clear to the student. 

If the means of the columns are plotted on a scattergram of the 
original data, it will be seen that 1 1 of the dots fall below the line 
connecting the means, that 8 fall above this line and that one falls 
on the line. The line of means obviously cannot be represented by 
a straight line; it is concave downward. Now a smooth curve may 
be drawn freehand as nearly as possible to fit the data. If it were 
not for the one extremely high felon rate, the freehand curve 
would be more concave than it is. Figure LX presents the data. 
An examination of Figure LVII suggests that the freehand curve 
approaches nearest to the curve (concave downward) whose equa- 
tion is Y = a + b log X. These are the same data to which a 
logarithmic curve was fitted above . 11 The former curve was fitted 

TABLE LXXIII 

Computation of Quantities for the Residuals and the Standard Deviation for 
Curvilinear Correlation — Crime Data 


Per Cent 
of Land 

X 

Felon 

Rate 

Y 

Felon Rate 
Estimated 
from Curve 

Y ' 

Y - Y ' 

(*) 

w* 

Y 2 

4.6 

2-7 

*•5 

I .2 

1.44 

7.29 

7 • 1 

2.6 

2.2 

•4 

. 16 

6.76 

6.8 

1-7 

2. 1 

- -4 

.16 

2.89 

7-5 

2.9 

2.3 

.6 

.36 

8.41 

5 -i 

2.1 

i -7 

•4 

.16 

4. 41 

5-3 

1 . 6 

i -7 

— . 1 

.01 

2.56 

9.8 

4.2 

2.8 

1-4 

1.96 

17.64 

9.0 

2-5 

2.7 

- .2 

.04 

6.25 

13-5 

5 -i 

3-7 

1-4 

1 .96 

26.01 

10.6 

1 -3 

3-0 

-i -7 

2.89 

1 .69 

16.3 

5-7 

4-2 

i -5 

2.25 

32.49 

16.6 

3-6 

4-3 

- .7 

■49 

12.96 

15-9 

1 . 1 

4-2 

- 3-1 

9.61 

1 .21 

20.0 

4-9 

4.8 

. 1 

.01 

24.01 

23-3 

4.0 

5-2 

— 1 .2 

1.44 

16.00 

23.1 

13-8 

5-2 

8.6 

73 96 

190.44 

22.0 

7.9 

5 -i 

2.8 

7.84 

62.41 

24-3 

7-9 

* 5>2 

2-7 

7.29 

62.41 

33-9 

2.8 

5-4 

— 2.6 

6.76 

7.84 

34-0 

3-3 

5-3 

—2.0 

4.00 

10.89 

Totals 308.7 

81.7 

72.6 

+ 9 - 1 

122.79 

504-57 


11 See pp. 291-293. 



STATISTICAL ANALYSIS 


305 

mathematically, but it is common practice in computing curvilinear 
correlation to use the freehand curve from which to read off the 
values of Y corresponding to values of X on the graph. This will 
be done in the problem here. The estimated values of Y lie on the 
smooth logarithmic curve which was drawn freehand. In the com- 
putation of the index of correlation and the index of determina- 
tion logarithms are not used 5 the actual and estimated data are 
used. It is necessary to guess the equation of the curve of best fit 
in order to know how many constants will enter into the equation, 
because this fact is used in certain parts of the procedure. Table 
LXXIII shows the process of computing the index of correlation. 
The comparison of the sum of the Y values with the sum of the 
Y f values shows the margin of error made in drawing the free- 
hand curve. If they were the same, the sum of the differences, z, 
would be O, but instead it is 9.1, the mean of which is .46. Since 
the sum of the z values is a plus quantity, the mean of these values 
indicates that the freehand curve should be shifted up .46 units on 
the Y scale. When the freehand curve is used for estimating Y 
values, the regression equation may be written: 

y =*+/(*) 

in which k is the constant corresponding to a in the general equa- 
tion for the curve. This constant is the mean of the sum of the 
differences between Y and Y ', which in our problem is .46. The 
regression equation may then be written: 


in which f(X ) may be read “factor of X.” To estimate a Y value, 
then, and include the correction for the error made in drawing 
the freehand curve, we simply substitute any given value of Y as 
indicated by a point on the freehand curve and add to it .46. 

The totals of the columns in Table LXXIII give the quantities 
necessary to determine the degree of correlation between per cent 
of land used for business purposes and the felon rate. The neces- 
sary standard deviations may be obtained from the following 
formulas: 


Gy = 

Gz = 




n(M v y 


2 (z) 2 - n(M z y 



3°6 


SOCIAL STATISTICS 


Substituting the appropriate values in these equations and solving 
we have: 




20 

122.79 “ 4- 2 3 


From the following formula the index of correlation, corrected 
for the number of observations, may be determined: 


Pyx' — 



Substituting the appropriate values in the equation, we have: 



The symbol m in the formula refers to the number of constants in 
the equation of the curve of best fit j in this case it is the guessed 
logarithmic curve drawn freehand. The index of correlation is 
found to be .617, and, since the index of determination is simply 
the square of the index of correlation, the index of determination 
is .381. The latter represents the per cent of variance in Y which 
is also present in X) in other words it accounts for 38.1 per cent 
of the factors entering into the determination of Y . 

The same method can be used in working out curvilinear cor- 
relation for data in which the curve of best fit is some other loga- 
rithmic curve, a hyperbola or a parabola. The constants and the 
standard deviations to be found would be the same, though in 
the case of some curves there will be three or more constants 
instead of two. 

It remains to compute the standard error of estimate. This is 
determined by the formula: 


y./(x) 


n — m 


Substituting the appropriate values in this equation, we have: 



TABLE LXXIV 

Correlation of the Sex Ratio and the Marriage of Women 1 


STATISTICAL ANALYSIS 


307 




SOCIAL STATISTICS 


308 

The limits of the standard error are represented by the broken 
lines in Figure LX. The chances are 2 to 1 that estimates of the 
value of Y by means of the regression equation above will fall 
within plus or minus the standard error of estimate, 2.56. This is 
a large standard error, but it would be reduced by more than a 
third if the one extreme item were eliminated. 

Sometimes the data one wishes to use in computing correlation 
are so numerous that it would be unnecessarily laborious to work 
out the correlation by exactly the method used in the preceding 
illustrations, or the data may already be in the form of frequency 
tables in which case it would be impossible to determine the sepa- 
rate items. It is, therefore, desirable to have a method of com- 
puting the degree of correlation from grouped data. A method 
for doing this from data whose relationship is indicated by a 
straight line will now be described. 

Data for illustrating the group method of correlation will be 
taken from some material collected by Professor W. F. Ogburn 
concerning marriage and the sex ratio. The data are presented in 
Table LXXIV in the form of a correlation table above. 

The symbols have the following meaning: 


y = 
s F,- 
2Fy = 

d X ~~ 

dy = 

2d x F = 
Zd v F = 

dy&dxF) = 

d^F) - 
d x F x = 

dyF y = 
d*F x = 

d*Fy = 


percentage of women 25 years of age and over who are 
married 

sex ratio — males per 100 females 

sum of the frequencies in the columns 

sum of the frequencies in the rows 

step-deviations from assumed mean of X 

step-deviations from assumed mean of Y 

algebraic sum of the ^-deviations times the frequencies 

algebraic sum of the y-deviations times the frequencies 

product of columns (2) and (3) 

product of rows (2) and (3) 

^-deviations times the sum of the frequencies in each 
column 

jy-deviations times the sum of the frequencies in each row 
^■-deviations squared times the frequencies in each column 
y-deviations squared times the frequencies in each row 


Before computing the coefficient* of correlation and the regression 
equation, certain correction factors must be computed for the devia- 
tions from the mean. The corrections to be made are for 2 d v 2 F y , 
c v ; for %d 2 F, n c,; and for c x . The corrected quantities are 

found in the following manner: 



STATISTICAL ANALYSIS 


309 


Let 


2 y 2 = 'Ldy^Fy — &dyF) — yf? ., corrected ^-squares 


^d x F u 


, corrected ^-squares 


2 

-y corrected y%-products 


Substituting in these equations the values appearing in Table 
LXXIV, we have the following results: 

o _ 

648.4 
1 70 

= 463 - (83) = 422.3 

170 

= 462 — (80) = 422.8 

170 

The correction for the y#-products is for the regression of Y on 
X y that is, for the estimation of values of Y from known values 
of X . If it were desired to estimate values of X from known 
values of Y, then the correction factor to be used would be c v to 
obtain the corrected ry-products. But ordinarily we are concerned 
only with estimating values of the dependent variable from known 
values of the independent variable. The corrected values, shown 
above, are now substituted in the formula for computing the 
coefficient of correlation : 

T yx ~ 


+ 422.8 


= +.808 

The coefficient is quite high, which means that the correlation be- 
tween the percentage of females 25 years of age and over and the 
sex ratio is close and positive. The regression equation will now 
be determined, and the first step is to compute the constants a 
and b: 

* 

= .652 intervals 

13 The quantities subtracted are the product of the sum of the products of the 
step-deviations times the frequencies and the mean deviation of each item from 
the assumed mean group in intervals. 



3io 


SOCIAL STATISTICS 


To reduce b yx to terms of scale units compute the ratio of the class- 
interval of Y to the class-interval of X, as follows: 


Multiplying .652 by this figure, we get 1.467 scale units for the 
value of b yx . 

a = M y — b yx M x 
= 104.7 ~ (1-467) (67-9) 

= 5-i 

The regression equation is then: 

Y = 5.1 + 1.467* 

Using the same formula as previously used for the correction of 
the coefficient of correlation for the number of items, we have: 

r V x 1 2 = 1 - (1 - r 2 ) 

n — m 


r yx 


= I - (1 - .6529) 

= .6502 
= .806 


170 - I 
170 - 2 


When the product-moment method of correlation is used, it is 
customary to write the coefficient plus or minus the probable error. 
The probable error of the coefficient of correlation above is com- 
puted below: 


1 — .6496 

13-04 

= ±.018 

The coefficient may then be written: 

fy X = +.806 ± .018 

It is usually held that, if the coefficient is 5 or 6 times the size of 
its probable error, or still greater, it is significant. Since our co- 
efficient is many times greater than* the probable error, we may 
conclude that it is significant. # 

If the regression equation is used for estimating future values 
of Y, the estimates should be accompanied by the standard error 
of estimate. For this purpose, we may use the following formula: 



STATISTICAL ANALYSIS 311 

The standard deviation of 7 , computed from S d y 2 F y > corrected, 
is 17.4. Substituting in the formula, we have: 

Sy = I 74 V I ~ .6496 
= 10.3 

It should be noted that, if the product-moment method is used 
for computing the coefficient of correlation, it is not necessary to 
use the ordinary equation for regression. An alternative equation 
is available and is given below : 

Y- M y = (X- M x ) 

a x 

Much of the arithmetic involved in this equation has already been 
done in the process of deriving the constants, a and b. This equa- 
tion has no special merit, and the equation, Y — a + bX , is in 
more general use. 


4. EXERCISES 

i. Below are given two tables. Experiment with different kinds 
of curves and decide which is the best fit. Compute the equation 
of the curve in each case: 

TABLE LXXV 

Divorced Persons per 1,000 Males and per 1,000 Fe- 
males over 15 Years of Age in Certain Census Tracts 
of Indianapolis 


Divorced Persons 

Male 

Divorced Persons 
Female 

2.2 

6.9 

2.2 

3-2 

2.8 

13-7 

4-9 

9-1 

S .8 

9-4 

6.2 

H -3 

6. 4 

13-4 

6-5 

7-7 

7.2 

5-4 

7.2 

1 5-5 

8.0 

13.0 

8-7 

15.2 

9-1 

12.9 

9-9 

10.5 

11. 7 

27.9 

12.7 

25.8 

14.8 

28.1 

17-7 

25.7 

18.8 

16. 1 

19-5 

21.9 



312 SOCIAL STATISTICS 

TABLE LXXVI 


Amount of Relief per Relief Case and Amount of Re- 
lief per Allowance Case in 20 Relief Agencies, Sep- 
tember, 193 i 1 


Relief per 

Relief Case 

Relief per 

Allowance Case 

$11 

#27 

15 

40 

18 

24 

18 

67 

19 

30 

20 

24 

20 

26 

25 

64 

25 

32 

26 

39 

27 

39 

28 

32 

29 

54 

3 i 

38 

33 

5 i 

35 

38 

37 

41 

40 

47 

44 

55 

48 

53 


1 Monthly Reports , Department of Statistics, Russell Sage 
Foundation. 


Note: A relief case is any case which receives financial assistance from a social 
agency, but an allowance case is one for which a long-time plan has been made and 
usually contemplates a large expenditure of funds. Allowance cases usually constitute a 
small percentage of the total relief case load. 

2. The following table gives the number of police per 1,000 popu- 
lation and the number of serious crimes committed per 1,000 
population in the month of October, 1931, in 30 cities of 
250,000 or more: 


TABLE LXXVI I 

Police per 1,000 Population and Crimes per 1,000 Population 
in 30 Cities, October, 1931 1 


City 

Police per 1,000 
Population 

Crimes per 1,000 
Population 

Akron, 0 

.8 

13 

Birmingham, Ala 

* 1 .0 

1.8 

Dallas, Tex 

1 . 1 

1-5 

Columbus, 0 

1.2 

2-7 

Houston, Tex 

1.2 

2.8 

Minneapolis, Minn. . . 

1.2 

1 .0 

St. Paul, Minn 

1.3 

1 .1 

Oakland, Cal 

1.4 

1.8 



STATISTICAL ANALYSIS 313 


TABLE LXXVII— {Continued) 


City 

Police per 1,000 
Population 

Crimes per 1,000 
Population 

Portland, Ore 

1.5 

3-4 

Denver, Colo 

1.5 

2.2 

Cincinnati, 0 

1.5 

1.9 

Toledo, O 

1.5 

2.9 

Indianapolis, Ind 

1.6 

2.8 

Louisville, Ky 

1.6 

2.0 

Rochester, N. Y 

1.6 

.9 

Kansas City, Mo 

1.7 

1-3 

Cleveland, O 

1.7 

2.0 

New Orleans, La 

1.9 

.8 

Chicago, 111 

2.0 

2.7 

Milwaukee, Wis 

... 2.0 

IvO 

Buffalo, N. Y 

2.2 

.7 

San Francisco, Cal 

2.2 

2.2 

Baltimore, Md 

2.3 

1.3 

Providence, R. 1 

2.4 

1.5 

Detroit, Mich 

2.6 

1.8 

Philadelphia, Pa 

2.8 

.6 

St. Louis, Mo 

2.8 

1.8 

Washington, D. C 

2.9 

3-2 

Boston, Mass 

3-3 

1.5 

Jersey City, N. J 

3.6 

•4 


1 Uniform Crime Reports , United States Department of Justice, 
Vol. II, No. 10. 


(a) Fit a curve to the data in Table LXXVII. 

(b) How important is the relation between police protection 
and number of crimes committed? Determine this by com- 
puting the degree of correlation which exists between the 
two series of data. Also compute the regression equation 
and the standard error of estimate. 

Table LXXVIII gives the Index of Educational Interest (i.e., 
the school attendance rate 7 to 13 years of age) and the per 
cent illiterate in the population 21 years of age or over in 36 
Texas Counties in 1920: 


TABLE LXXVIII 

Index of Educational Interest and Index of Illiteracy, 36 
Texas Counties, 1920 1 


County 

Index of 

Educational Interest 

Index of 
Illiteracy 

Carson 

94 * 1 

i -7 

Camp 

..... 93-7 

14.0 

Angelina 

93.1 

6.9 

Cass 

91-9 

* 3 -i 

Bosque 

91.2 

4.0 



3H 


SOCIAL STATISTICS 
TABLE LXXVIII — {Continued) 


County Index of Index of 

Educational Interest Illiteracy 


Armstrong 90.9 1.2 

Bell 90.1 4.4 

Cherokee 89.5 10.6 

Brown 89.0 2.0 

Childress 87.8 1.8 

Clay 87.2 1.9 

Brazoria 86.8 13.3 

Chambers 86.5 10.2 

Bowie 86.4 1 1. 1 

Burleson 86.4 14.1 

Anderson 86.1 9.8 

Brazos 85.7 16.7 

Burnet 85.5 3.8 

Austin 85.2 8.3 

Castro 84.4 1 . 1 

Arkansas 84.2 8.0 

Archer 83.9 2.0 

Briscoe 82.0 1.7 

Bexar 81.3 13.3 

Calhoun 81.3 7.1 

Bastrop 81.1 15.9 

Blanco 79.7 3.4 

Baylor 79.2 1.4 

Callahan 78.6 2.9 

Bandera 77.5 3.4 

Atascosa 65.4 22.8 

Bee 63.2 24.0 

Cameron 61.8 33.1 

Brewster 57.3 31.2 

Caldwell 55.0 26.2 

Brooks 45-4 34-8 


1 Ross, Frank A., School Attendance in the United States , 1920, 
p. 210. 

(a) Fit a curve to the data in this table. 

(b) Determine the degree of correlation, the index or coef- 
ficient of determination, the regression equation, and the 
standard error of estimate. 

(c) Show graphically the regression curve and the limits of 
and the standard error of estimate. 

4. Table LXXIX gives data for computing the correlation be- 
tween the sex ratio in the population and the percentage of 
women 25 years of age or over who are married. The method 
for grouped data is required": 

(a) Compute the coefficient of correlation for the data in this 
table. 

(b) Determine the regression equation for the dependent varia- 
ble and the standard error of estimate. 





S i £ 


SISA1VNV TVOIJLSIXVXS 


TABLE LXXIX 

The Number of Males per ioo Females and the Per Cent of Women Married in 170 Cities 1 



SOCIAL STATISTICS 


316 

(c) Show the regression line and the standard error of estimate 
graphically. 

5. In order that the student may gain practice in thinking in 

terms of functional relations, let each student obtain data: 

(a) Which show linearity and are ungrouped. 

(b) Which show curvilinearity and are ungrouped. 

(c) Which show linearity and are grouped. 

(d) In each case compute the degree of correlation, the regres- 
sion equation, the coefficient or index of determination, 
and the standard error of estimate. 

(e) In each case present graphically the regression equation 
and the standard error of estimate. 

5. REFERENCES 

Chaddock, Robert E., Principles and Methods of Statistics , Chap. 

XII. 

Ezekiel, Mordecai, Methods of Correlation Analysis y Chaps. 3-9. 

Mills, Frederick C., Statistical Methods y Chaps. X, XII, XIII. 

Thurstone, L. L., The Fundamentals of Statistics y Chaps. 22-24. 



CHAPTER XII 


The Theory of 
Probability 


I. INTRODUCTION 

Statistics is concerned with chance variations, or probabilities. It 
is, therefore, not surprising that the first persons who became 
seriously interested in the theory of probability were gamblers. As 
early as the fifteenth century various European mathematicians 
were asked by gamblers to calculate the probabilities of winning 
in games of chance. The names of Pascal, Fermat, and Leibnitz 
appear among those consulted by gamblers. The first scientific 
treatise on the subject was written in Latin ; it was published 
November 12, 1733, by De Moivre. It approached the problem 
by the method of binomial expansion and was intended to be a 
guide to gamblers. In the early part of the eighteenth century 
astronomers became interested in probability, and the number of 
mathematicians interested in it increased. Among those who made 
important contributions to the subject were Laplace and Gauss. 
Serious interest in the theory of probability, then, had an empirical 
origin. Since it began to attract wide attention among mathema- 
ticians much work has been done on it, but in books on statistics 
the chief interest is still empirical. Natural and social phenomena 
seem to occur or vary according to the laws of probability ; hence, 
every step in social statistics involves the theory of probability. 1 

In reading the preceding chapters and working out the problems 
in connection with methods described, the student must have been 
aware that he was dealing with a chance distribution of measure- 
ments or counts. At all times it has been clear that a statistical 
result was a “probable result” within certain limits of variability. 

1 For a good summary of the history of the theory of probability, see Walker, 
Helen M., Studies in the History of Statistical Method , Chap. II. Baltimore: 
Williams & Wilkins, 1929. 

317 



318 SOCIAL STATISTICS 

The measures of dispersion — quartile deviation, average deviation, 
and standard deviation — are frank admissions that an average is 
only the most likely value and that, in fact, any sample of data 
taken from a universe will show scatter above and below the 
average. In a distribution which approaches the symmetrical bell- 
shaped form 50 per cent of the values will fall between the median 
minus and plus once the quartile deviation ; that is, the chances, or 
probabilities, are even that any value selected at random will be 
neither less nor greater than the median minus and plus once the 
quartile deviation. In such a distribution 57.5 per cent of the 
values will fall between the average minus and plus once the 
average deviation. The corresponding limits of the standard devia- 
tion from the mean include 68.26 per cent of the values. Here we 
are speaking of chance, or probability, but it is chance with refer- 
ence to the specific data in hand and not with reference to the 
universe of data from which the sample was drawn. Normal 
probability is a concept derived from the distribution of all the 
values in the universe or upon a sample indefinitely large. Any 
particular sample must be referred to the normal distribution of 
the universe of data as its standard of accuracy. The standard error 
of estimate of a regression equation is likewise a measure of the 
chances of occurrence of an event. It involves the theory of proba- 
bility. Instead of saying that we can estimate the value of Y from 
a known value of X, we say that we can estimate the value of Y 
within certain limits of variability, or within the limits of its 
standard error. Obviously the smaller the standard error, the 
greater the reliability of estimates. 

The term “error” in statistics does not refer to mistakes. Mis- 
takes arise from hasty or careless work or from inaccurate percep- 
tion. To err means to wander from a path or a norm. In every 
universe of data there is a central value about which it is normal 
for the individual measures to err or wander. Errors, in this sense, 
can be determined mathematically. The probability of the occur- 
rence of an event of a certain magnitude is the chance out of a 
finite number of possible events that’the particular event will occur. 

2. ELEMENTARY ILLUSTRATIONS OF PROBABILITY 

If a coin is tossed, one of two things may happen: the tail will 
turn up or the head will turn up. The chances are even that the 
coin will fall tail up or head up. How may this fact be expressed 
in symbols? Let f represent success, q represent failure, and n 



STATISTICAL ANALYSIS 


319 


represent the total ways in which the event may occur. A head 
will be represented by a and a tail by b. Then, if a head may be 
regarded as success and a tail as failure, the chances of a head 
falling may be expressed thus: 


and the chances of failure are: 


Or, 

X 

p = - 

2 

And 

I 

* = 2 


But suppose that instead of there being only 2 possible events, 
there are 52, as there would be in drawing a particular card from 
a complete deck of cards. The chance of drawing a jack of hearts 
from a deck of cards is: 


_ a __ 1 
^ n 52 

The chance of drawing any heart from the deck would be i, 
because one-fourth of all the cards are hearts. The probability of 
an event occurring is the ratio of the event to the total number of 
possible events. 

But suppose there are two alternatives out of a large number of 
possible events. What would be the chance of drawing either a 
jack of hearts or an ace of diamonds from a deck of cards? The 
probability of one or the other of these events happening is the 
sum of the separate probabilities and may be expressed thus: 

c , d 11 1 

P — — I — = 1 = -7 

n n 52 52 26 

If we think of the drawing of these two cards as two separate 
withdrawals, we have a compound event. Neither is dependent 
upon the other, and two cards are to be drawn. Under such cir- 
cumstances the chance of drawing a jack of hearts and an ace of 
diamonds is the product of the probabilities: 


P = 


-X--—X — - 
n n 52 52 


1 

2704 



320 


SOCIAL STATISTICS 


What has been indicated in simple terms can be expressed in 
general terms as the expansion of a binomial. Since the tossing of 
a coin is about as nearly uncontrolled by any factors outside of 
gravitation as any event is likely to be, we shall continue the coin 
illustration. If we toss two coins four times, there are four possible 
combinations of heads and tails: 

1 head, i tail 

2 heads 
2 tails 

i tail, i head 

What are the chances of securing two heads, no heads, and one 
head? The chances of securing two heads are i; of securing one 
2 

head, of securing no heads, Similarly, the chances of se- 
4 

curing a certain number of heads and tails could be determined if 
5 coins or io coins were used. This is a problem in binomial 
expansion. Using p and q with the same meaning as above, the 
following binomial holds for 2 coins: 


Or, since the chances of at least one head or at least one tail are 
i, we may express it with the numbers thus: 


4 4 4 

The number of coins determines the power of the binomial. If we 
should use 5 coins, the binomial would be: 

(j>+ qY = P b + SP*q~ 

and we should have the following if numbers are used: 


32 ’ 32 ’ 32 ' 32 ’ 32 ’ 32 

If we should throw the 5 coins 100 times, the number of the above 
combinations would be 100 times the numerator of each term in 
the expanded binomial, or 1 00/500, 1000, 1000, 500, and 100, 
respectively. That is the theoretical distribution which would re- 
sult. If it were actually done, the numerators of the terms would 
vary some from these even quantities. However, if the coins were 
thrown 10,000 times, the chances of a distribution proportionate 



STATISTICAL ANALYSIS 


3 21 


to the numerators of the terms in the expanded binomial would 
be good. The larger the number of throws, the more closely to 
the theoretical distribution the result is likely to be. 

An experiment to determine the nearness of actual successes to 
theoretical successes was made by Mr. W. F. R. Weldon. He took 
12 dice and threw them 4,096 times. A throw which turned up 4, 
5, or 6 points was regarded as success, and a throw which turned 
up 1, 2, or 3 was regarded as failure. This number of throws is 
sufficiently large to approach the theoretical distribution of suc- 
cesses. Table LXXX gives the number of successes for each throw 
and the frequencies: 

TABLE LXXX 

Comparison of Actual and Theoretical Success Frequencies 
in 4,096 Throws of 12 Dice 1 


Number of Successes 


Frequency, Frequency, 

Actual Theoretical 


0 

1 

2 


3 

4 


7 

8 


9 

10 


11 


12 


0 

1 

7 

12 

60 

66 

198 

220 

430 

495 

73 i ' 

792 

948 

924 

847 

792 

536 

495 

257 

220 

7 i 

66 

11 

12 

0 

1 


1 For the actual frequencies, see Yule, U. G., op. cit ., p. 258, or 
the Encyclopedia Britannica y 1 ith ed., Vol. XXII, p. 394, article by 
F. Y. Edgeworth. 


In any particular throw of the 12 dice it is possible to have o 
successes or as many as 12 successes. The theoretical frequencies 
represent the expansion of (? + $ 0 12 . An examination of the 
theoretical frequencies will reveal the fact that the distribution is 
perfectly symmetrical. The actual frequencies approach the the- 
oretical proportions, but they vary from them slightly at every 
point. In order to show more clearly the relation between the 
two distributions, they are presented graphically in Figure LXI: 
The two curves are quite similar. Obviously, if enough throws of 
the dice were made, the empirical curve would approach closer and 
closer to the form of the theoretical curve based upon the expan- 
sion of the binomial. Where either of two events may happen 



322 


SOCIAL STATISTICS 



— — THEORETICAL — — — ACTUAL 

Figure LXI. — Number of Successes (X) and Actual and Theoretical 
Frequencies (Y) in 4,096 Throws of 12 Dice 


SUCCESSES 


STATISTICAL ANALYSIS 323 

and where no forces except chance operate, the law which describes 
their occurrence is the normal curve, as shown in Figure LXII. 

The theoretical mean is M = 6.0, and the theoretical standard 
deviation is 0-12 — 1.732. The actual mean is M = 6.139, and the 
actual standard deviation is &V2 = 1.7 12. It is very simple to 
determine the mean and the standard deviation of the theoretical 
distribution. The formulas are as follows: 


M12 = np 
ffi2 = a/ npq 

in which n is the number of dice, and p and q have the same 
meaning as above. The same formula would be used for any 
number of dice which might be used. This number determines the 
power of the binomial, and the number of terms in the expanded 
binomial will be one more than the power to which the binomial 
is raised. 

The principal value of the theoretical curve lies in the fact that 
it provides a basis of generalization. Any sample taken from a 
universe of data which theoretically are distributed according to 
the normal curve will vary more or less from the smooth curve. 
That variance is a measure of the atypicality of the sample; there 
were chance fluctuations in the selections of the sample, or there 
was a bias which led to error. As previously suggested, this the- 
oretical curve is variously known as the normal curve of error, 
the bell-shaped curve, the . perfectly symmetrical curve, or the 
Gaussian curve. 


3. THE NORMAL CURVE OF ERROR 

Some further explanation of the normal curve of error is de- 
sirable in order to show its uses in practical statistical work. The 
concept of errors will be clearer if Figure LXII is examined. The 
diagram is made on the basis of rectangular coordinates to em- 
phasize the nature of statistical error — not statistical mistakes. 
In Figure LXII, YO indicates the value, that is, the mean, at 
which the largest number of frequencies occur in the normal dis- 
tribution, such as the theoretical distribution of successes in the 
coin throwing experiment. It may be referred to as the zero ordi- 
nate. Any X-value besides the mean will have less frequencies 
than the mean value of X. There are as many values of X less 
than the mean as there are values of X greater than the mean. 
Values of X to the right of O are plus values, and values of X 





STATISTICAL ANALYSIS 


325 

to the left of O are minus values, with respect to the mean. In the 
dice throwing experiment the most likely result of any throw is 
6 successes. If 6 success values are not thrown, the chances are 
even that the number of successes will be above or below the 
mean. We could obtain a similar distribution of data if we meas- 
ured the heights of all schoolboys 12 years of age in a large city; 
their heights would be distributed approximately in the form of 
the normal curve — of the measures deviating from the mean 
height, half would likely be above the mean and half below. Any 
small sample of boys might reveal a height distribution varying 
considerably from the normal curve. Graphic comparison of the 
curve of the sample with the theoretical curve would indicate 
roughly the degree of agreement. 

We can, however, determine the degree of similarity between 
a given and a normal frequency distribution by the method of 
moments. The procedure for fitting a theoretical curve will be 
described later, but at this point the computation of the moments 
of a frequency distribution will be illustrated. The following table 
gives the data required and the first arithmetical step: 

TABLE LXXXI 

Computation of Values Required for the Determination of Moments- -Intel- 
ligence Test Data 1 


Class- 

Interval 

X 

(1) 

Mid- 

Point 

m 

(2) 

Fre- 

quency 

& 

Step- 

Deviations 

X 

(4) 

fit 

is) 

/w 2 

(6) 

/(*)• 

( 7 ) 

A*)* 

(8) 

50 - 59-9 

55 

11 

-5 

- 55 

275 

-1375 

6875 

60- 69.9 

65 

59 

-4 

-236 

944 

-3776 

1 5 104 

70 - 79-9 

75 

149 

-3 

-447 

1341 

-4023 

12069 

80- 89 • 9 

85 

256 

—2 

-512 

1024 

— 2O48 

4O96 

90- 99.9 

95 

328 

— 1 

-328 

328 

- 328 

328 

100-109.9 

105 

352 

0 





110-119.9 

n 5 

249 

1 

249 

249 

249 

249 

1 20-1 29. 9 

125 

165 

2 

330 

660 

1320 

264O 

i 3 °-i 39-9 

135 

68 

3 

204 

612 

I836 

5508 

140-149-9 

145 

22 

4 

88 

352 

I4O8 

5632 

150-159.9 

1 55 

8 

5 

40 

200 

IOOO 

5000 

160-169.9 

165 

2 

6 

12 

72 

432 

2592 

I 7 °-i 79-9 

175 

2 

7 

14 

98 

686 

4802 



1671 


-641 

6*55 

-4619 

6489 s 


1 Goodenough, Florence L., Measurement of Intelligence by Drawings. Yonkers: World 
Book Co., 1926. See p. 46, Table 8, last column. 


The sums of the four columns containing the products of the 
frequencies and powers of x are known as the moments of the 



SOCIAL STATISTICS 


326 

distribution about an arbitrary origin. The term “moment” is bor- 
rowed from mechanics and refers to the force required to produce 
rotation about a point. The greater the distance of the application 
of the force from the axis of rotation, the greater the power of 
the force. In statistics the frequencies of the various class-intervals 
are regarded as the forces, and the axis of rotation is the arbitrary 
origin from which the step-deviations are measured. 

The moments about the arbitrary origin are computed as 
follows: 

v\ = = —.383, the first moment 

n 1671 

2/(,v) 2 6151 , 0 , , 

— - / — 3.683, the second moment 

1671 

= —2.764, the third moment 

1671 

, , the fourth moment 

n 1671 

But it is not the moments about the arbitrary origin which are of 
most importance: it is the moments about the mean. These are 
computed in the following manner: 

7Ti = o, first moment about the mean 
tt 2 = v 2 — vi 2 = 3.536, second moment about the mean 
7 t 3 = v s — 3^1 v 2 + 2vi s = 1.176, third moment about the mean 
tt 4 = v \ — 4^1 v 3 + 6v 1 2 v 2 — 3^i 4 — 37.721, fourth moment about 
the mean 

W. F. Sheppard has shown that, because of the grouping into 
class-intervals, certain corrections should be made in the second 
and fourth moments. The corrected moments are as follows: 

Mi = o 

M2 = 3-536 - 1/12 = 3.453 
Ms = 1.176 

M 4 = 37-721 - I / 27 T 2 + 7/240 = 35.982 

From the corrected moments we obtain two other functions which 
enable us to determine whether or not the distribution is of the 
type of the normal curve. These are determined as follows: 


44.174 


M 2 * 


= 8 



STATISTICAL ANALYSIS 


327 


For the normal curve these functions are: 


It is, therefore, clear that the distribution of intelligence quotients 
approaches closely to the type of the normal curve. 

It is worth while noting that the standard deviation in intervals 
of the distribution is equal to the square root of ft 2 . Thus: 


However, it is desirable to have a method of fitting a theoretical 
curve to actual data which seem to conform to the normal dis- 
tribution. This can be easily done by reference to a table of 
integrals, because the height of any ordinate above or below the 
zero, or maximum, ordinate bears a definite relation to the height 
of this maximum ordinate. This ordinate is called the maximum 
ordinate because it represents the greatest number of frequencies 
of any ordinate that can be drawn. The most common equation for 
the normal curve is: 

y = yoe 2a2 

in which y is the particular ordinate desired; y 0 is the maximum 
ordinate; e is a constant with the value of 2.7182818 (the base of 
the Napierian logarithms); x is the value of the independent 
variable for which the ordinate is to be determined; and o- is the 
standard deviation of the data. The use of this formula is rather 
complicated. If the maximum ordinate is known, the relative size 
of other ordinates may be read from a table of integrals, and the 
computation is then simple. The formula for determining the 
maximum ordinate is: 


n 

°r, yo = 77“ 

2.5066a- 

In this formula o' should be expressed in intervals. The heights 
of other ordinates may be read from Table CXXI in Appendix A. 
The height of the ordinate is determined by its distance in terms 
of standard deviation from the mean. For example, if it is desired 
to know the height of the ordinates .50- above and below the 
maximum ordinate, we look at Table CXXI and find .5. To the 



SOCIAL STATISTICS 


328 

right in the first column is the number 88250, or it is 88.250 per 
cent of the height of the maximum ordinate. Knowing the fre- 
quencies represented by the maximum ordinate, we take 88.250 
per cent of these frequencies, and the result is the frequencies of 
the ordinates above and below the maximum ordinate at .50- re- 
moved. In a similar manner other ordinates can be computed. For 
purposes of illustration some intelligence test data will be used. 
They are given in the following table: 

TABLE LXXXII 

I.Q’s of 1,671 Children, Ages 6 to 12 


Number of 
Children 


50“ 59-9 11 

60- 69.9 59 

7°“ 79 -9 H9 

80- 89.9 256 

90- 99-9 328 

100-109.9 352 

110-119.9 249 

120-129.9 165 

I30-I39-9 68 

I4O-I49.9 22 

150-159.9 8 

160-169.9 2 

I70-I79-9 2 


TABLE LXXXIII 

Fractions of Sigma, Ratio of y to y 0i and Theoretical Fre- 
quencies for the Normal Curve 


Deviations from Mean in Fractions 

Normal Curve 


of a 

y/yo 

y 


1 .0000 

355 

.9802 

348 

.9231 

328 

•8353 

296 

.7262 

258 

.6065 

215 

.4868 

173 

• 3753 

133 

.2780 

99 

.1979 

70 

.1353 

48 

.0889 

32 

.0561 

20 

•0341 

12 

.0198 

7 

.oin 

4 

.0003 

0. 1 

.0000 

0.0 


.0. 

.2. 

• 4 - 

. 6 . 

.8. 

1 .0. 

1 .2. 
if 

1 .6. 

1.8. 

2.0. 

2.2. 
2.4. 

2.6. 

2.8. 

3 -o. 

4.0. . 
<.o.. 



STATISTICAL ANALYSIS 


329 


It will be seen that these data are distributed approximately in the 
form of a normal frequency curve. How close does this distribu- 
tion approach the normal distribution of I.Q’s of the same mean 
and the same standard deviation? Table LXXXIII gives the the- 
oretical frequencies of the fitted normal curve for the I.Q. data. 
The symbol y represents any particular theoretical frequency, and 
yo represents the theoretical maximum ordinate. Each of the fre- 
quencies below the frequency of the maximum ordinate in thie 
table will appear above and below the maximum ordinate in the 
complete distribution and in the graph of the curve; that is, we 
have to use both plus and minus fractions of the standard deviation 
as measured from the mean ordinate. Figure LXIII shows how 
actual and theoretical distributions compare (see next page). 
The curves are quite similar, but they coincide at only a few points. 
The difference may be explained in either of two ways: (1) the 
failure to fit may be due to chance variations in the sample, which 
would be eliminated if a large number of I.Q’s were taken; (2) or 
it may be that I.Q’s are not distributed according to the normal 
curve. This question can be answered, but the distribution of fre- 
quencies must be recalculated in terms of the area of the frequency 
polygon. If the fit of the curve is sufficiently close, it is reasonable 
to conclude that I.Q’s are distributed according to the normal 
curve and that the fluctuations are due to errors in sampling. 

The computation of frequencies in terms of the area of the 
frequency polygon is somewhat more laborious than their computa- 
tion in terms of the maximum ordinate, but the test of goodness 
of fit is in terms of the former. It was indicated in Table 
LXXXIII that the maximum, or zero, ordinate is unity, or 100.0 
per cent. Likewise, the total area of the frequency polygon is re- 
garded as unity. The object of the computations is to determine 
the proportion of frequencies in the area enclosed by the maximum 
ordinate and any other ordinate above or below it. After the devia- 
tions from the mean in intervals are determined and expressed as 
fractions of the standard deviation, the proportion of frequencies 
between the maximum and any other ordinate may be found in 
Appendix A, Table CXXII. Table LXXXIV shows the method 
.of computing the theoretical distribution of the 1,671 I.Q’s. 

The value of y is obtained by multiplying 1,671 by the value of 
y/yo for each class-interval. The total of the theoretical distribu- 
tion is two-tenths more than the total of the actual frequencies. 
If the ratio of y to y 0 had been carried to one or two more decimal 



^FREQUENCIES 


330 


SOCIAL STATISTICS 



THEORETICAL ACTUAL 

Figure LXIII. — Normal Curve Determined from Ordinates Expressed as Fractional Parts of the Maximum Ordinate, 



STATISTICAL ANALYSIS 
TABLE LXXXIV 


Computation of Theoretical Frequencies for 1,671 I.Q's 


Class- 

Intervals 

in 

I.Q’s 

X 

Class Limits 

Devia- 

tions 

from 

Mean 

in 

Intervals 

X 

Devia- 
tions 
from 
Mean 
- 5 - a in 
Inter- 
vals 

x/a 

Propor- 
tion 
of Area 
be- 
tween 
yo and 
Ordi- 
nate 

y/y 0 

Cases 
be- 
tween 
yo and 
Ordi- 
nate 

y 

iV = 
1671 

/ 

Lower 

Upper 

Below 40 





.5000 

835.5° 

1 .00 

40- 49.9 

40 


— 6.12 

- 3-2 6 

•4994 

834-5° 

4.52 

50- 59.9 

50 


-5.12 

— 2.72 

.4967 

829.98 

18.38 

60- 69.9 

60 


-4.12 

— 2.19 

•4857 

8n.6o 

48.96 

70- 79.9 

70 


-3.12 

-I. 71 

.4564 

762 . 64 

141.38 

80- 89.9 

80 


-2.12 

“I . 13 

•3718 

621 .28 

249.63 

90- 99.9 

90 


-I . 12 

“ -59 

.2224 

37163 

331-69 

1 00- 1 09. 9 

100 


- ' I2 \ 0 

- - o 6 \ 

.0239 

39-941 

/ 



no 

.88/° 

.46/ 

.1772 

296.IOJ 

330.04 

110-119.9 


120 

1.88 

1 .00 

.3413 

570.31 

274.21 

1 20-1 29. 9 


130 

2.88 

1-53 

•4370 

730.23 

159.92 

130-139.9 


140 

3-88 

2.06 

.4803 

802.58 

72.35 

140-149.9 


! 5 ° 

4.88 

2.60 

•4953 

827.65 

25.07 

1 50-159.9 


160 

5.88 

313 

.4991 

834.OO 

6.35 

160-169.9 


170 

6.88 

3.66 

•4999 

835-33 

i -33 

Above 170 





.5000 

835 50 

-17 


1671 .20 


M — 101.2 a = 1 8.8, units of I.Q. 

= 1.88, class-intervals 

places, the totals should have been identical. However, this varia- 
tion of .2 does not materially affect the size of the frequencies. 
The last column is obtained from the y y s\ The maximum ordi- 
nate is determined by adding 39.94 and 296.10, the frequencies 
in the two parts of the class-interval, 100- 109.9. The frequen- 
cies in this class-interval are in two parts, because one part is 
below the mean and one part is above. The frequency in the 
class-interval, 90-99.9, is found by subtracting 39.94 from 371.63, 
and the other frequencies below the mean are found in a similar 
manner by subtracting from the given y-value below it. The 
frequency for the class-interval, 110-119.9, is found by sub- 
tracting 296.10 from 570.31. The other frequencies above the 
mean are found by subtracting from the given y-value the y-value 
immediately above it. The results are given as / in the last column. 
It should be noted that x y which is in terms of intervals, should be 
divided by o- in terms of intervals. 



332 SOCIAL STATISTICS 




STATISTICAL ANALYSIS 


333 


Figure LXIV presents the actual and the theoretical frequencies. 
As in Figure LXIII, it is clear that the normal curve is not an 
exact fit for the data. The problem now is to determine whether 
or not it is a sufficiently close fit to justify the conclusion that 
I.Q’s are distributed according to the normal curve of error. 

It was pointed out above that the standard deviation may be 
determined from the formula: 

cr = y/ npq 

in which n is the number of events, p the probability of success, 
and q the probability of failure. In dealing with a frequency dis- 
tribution the formula has to be altered somewhat, as follows: 


in which / is the theoretical frequency at a given point on the 
X-scale and N is the total number of items. Then, 


let 


q = 


N-f 


and substitute N for n in the general formula, as follows: 


N 

This is called the standard error of sampling. 

We may now set up a table to show the differences between 
actual and theoretical frequencies. 

The absolute differences are not large, but the size of some of 
the differences relative to the frequencies is fairly large. We shall 
employ the formula for the standard error of sampling to see the 
significance of two of the variations. Let us take the first class- 
interval and the class-interval 110-119.9: 

3.38(1671 - 18.38) 

1671 



334 SOCIAL STATISTICS 

TABLE LXXXV 


Differences between Actual and Theoretical Frequencies 


Class-Interval 

m 

Actual Frequency 
/o 

Theoretical Frequency 
/ 

Differences 

/.-/ 

55 

11 

18.38 

- 7 - 3 « 

65 

59 

48.96 

IO.O4 

75 

149 

141.38 

7.62 

85 

256 

249-63 

6.38 

95 

328 

331-69 

- 3-69 

105 

352 

336.04 

15.96 

ii 5 

249 

274.21 

— 25.21 

125 

165 

159.92 

5.08 

135 

68 

72-35 

- 4-35 

145 

22 

25.07 

“ 3.07 

155 

8 

6-35 

1.65 

165 

2 

1-33 

.67 


The standard error of sampling is 4.26. Since the difference be- 
tween the actual and the theoretical frequency is — 7.38, the devia- 
tion from the mean is 1.7 times o- g . If we consult Appendix A, Table 
CXXII, we find that, when x/v is 1.7, the proportion of the total 
area of the frequency polygon included between the maximum 
ordinate and an ordinate erected at 1.70- is .4554. The area in- 
cluded between the ordinates erected at 1.7 o- above and below the 
maximum would be equal to 9 1 .08 per cent of the total area. The 
chances are about 9 out of 100 that a given value will differ from 
the mean by more than 1.70-. This is a fairly large deviation. Let 
us try another class-interval: 


The standard error of the sample divides into the difference be- 
tween the actual and the theoretical frequencies 1.7 times. Re- 
ferring to Table CXXII in Appendix A, it is seen that =*=1.70- 
from the maximum ordinate would include 91.08 per cent of all 
the frequencies, or, the chances are that about 9 times out of 100 
a given value would differ from the mean by more than 1.70*. 
This still suggests a rather wide variation, though the standard 
error might be due to fluctuations of sampling. 

If other class-intervals were, used for computing the standard 
error of sampling, we should probably get some variation from 
the two already computed. Some method is needed by which 
account may be taken of all the class frequencies. Karl Pearson 
has developed a method which is known as the Chi-Square Test 




STATISTICAL ANALYSIS 


335 


of Goodness of Fit. Table LXXXVI gives the data and the com- 
putations necessary for determining x 2 « X 2 is the sum of the 
squares of the differences between the actual and the theoretical 
frequencies divided by the theoretical frequencies. 

TABLE LXXXVI 


Computation of x 2 


Class- 

Intervals 

X 

Actual 

Frequencies 

/. 

Theoretical 

Frequencies 

/ 

fo-f 

(/»-/)• 

/ 

Below 60 

11 

23.90 

— 12.90 

6.96 

60- 69.9 

59 

48.96 

10.04* 

2.06 

70 - 79-9 

149 

141.38 

7.62 

.41 

8O- 89.9 

256 

249.63 

6 -35 

. 16 

90 - 99.9 

328 

331-69 

- 369 

.04 

IOO-IO9.9 

352 

336.04 

15.96 

.76 

IIO-II9.9 

249 

274.21 

—25.21 

2.32 

I 20-1 29. 9 

165 

159.92 

5.08 

. 16 

130-139-9 

68 

72-35 

- 4-35 

.26 

140-149.9 

22 

25.07 

- 3 -o 7 

.38 

Above 1 50 

12 

7.85 

4-15 

2.19 


1671 

1671 .00 


I S- 7 ° 


X 2 is 15.70. From Elderton’s table we find that when the num- 
ber of class-intervals, equals 11 and x 2 equals 15, the probability 
integral is .132061, and when n' is 11 and x 2 is 16, the probability 
integral is .099632. 2 The value of x 2 in our problem lies between 
these two values ; so it is necessary to interpolate to find the exact 
value of the probability integral. It proves to be .109461. This 
means that out of 100 samples of I.Q’s, the same size as the one 
used here, the chances are that about 10.9 would vary farther from 
the normal curve than the present sample. Two inferences from 
this fact follow in so far as the present sample is concerned. First, 
the fact that only about 1 1 per cent of other samples would vary 
farther from the theoretical curve than our sample suggests that 
ours is not a very good one, as samples of I.Q’s go, because about 
89 per cent of other samples would be nearer to a normal dis- 
tribution. Second, in view of the fact that the present sample 
approaches the form of a normal distribution and yet compared 
with other samples is not a very good one, it seems reasonable to 
conclude that I.Q’s are distributed according to the normal curve 
and that the theoretical curve fits the distribution. 


2 Pearson, Karl, Tables for Statisticians and Biometricians . London: Cam- 
bridge University Press, 1924. 



336 


SOCIAL STATISTICS 


4. ESTIMATION OF ERROR IN SAMPLES 

The tests of goodness of fit reduce the alternative explanations 
of the variations of the actual from the theoretical data to two: 
if the theoretical curve does not closely fit the actual data, the 
explanation may be that the sample is not representative of the 
universe of data, or it may be that this universe of data is not 
distributed according to the theoretical curve selected. Successive 
samples may be taken and compared with the theoretical distribu- 
tion. If the standard errors of the samples determined from the 
formula 


' w-f) 

N 

are not uniformly too large, it is reasonable to assume that an 
indefinitely large sample selected at random would approach 
closely to the theoretical distribution. On the other hand, if the 
standard error of sampling is persistently so large that estimates 
based upon it would be meaningless, the chances are that the dis- 
tribution has a form different from the curve selected to represent 
the data. 

There are two measures of reliability in use: the probable error 
and the standard error. The probable error is based upon the 
quartile deviation,' and the standard error is based upon the stand- 
ard deviation. Both are equally good as measures of reliability. 
That is obvious from the fact that there is a constant relation be- 
tween the two in a distribution which conforms to the normal curve 
of error. The probable error is .6745 of the standard deviation in 
a normal distribution. For this reason error computed in terms 
of one measure may be reduced to terms of the other. However, 
there are not equally good practical reasons for using the two 
measures. The standard error is more commonly used, and most 
of the published tables used in fitting curves to data have been 
computed in terms of the standard deviation. Hence, it is of 
practical importance for the student to understand clearly the 
standard error. Besides measures of error of samples, there are 
measures of error for other 'statistical measures. Several of the 
more common ones will be described. 

It was stated above that one may test the representativeness of 
a sample by taking successive samples and comparing them. This 
is undoubtedly the best method. But it is laborious and requires a 



STATISTICAL ANALYSIS 


337 

great deal of time. Frequently it is not practicable. In such cases 
the standard error of sample means, standard deviations, etc., may 
be determined from the mean, standard deviation, or other known 
measure. This shows within what limits these measures for other 
samples of the same size might be expected to vary. 

For example, the standard error of the mean may be deter- 
mined from this formula: 


S.E.Af 


Vn 


If we substitute in this formula the appropriate values obtained 
from the intelligence test problem, we have: 


o t? 1 8.8 

V (1671) 

=fc .46 

The chances are 2 to 1 that the mean of any other sample of the 
same size would not be less than 100.74 or greater than 101.66. 
That is a small range of fluctuation. The mean should be written 
101.2 =*= .46. That shows clearly, then, the limits of probable 
variation. If it were desired to use the probable error instead of 
the standard error, the formula would be: 


P.E.m = .6745 


Vn 


And the substitutions would be 


P.E.m = .6745 
= ± .310 

It is obvious that the only thing done was to multiply the stand- 
ard error by the constant, .6745. The probable error gives the 
range within which the chances are 1 to 1 that the mean of any 
other sample will not be less than 100.890 nor greater than 
101.510. We have simply included a smaller proportion of the 
area of the frequency polygon within the limits of the measure 
of error. The standard error and the probable error are not to 
be contrasted. The first simply accounts for the range within 
which two-thirds of the cases will likely fall, whereas the other 
accounts for the range within which one-half of the cases will 
likely fall. 



338 


SOCIAL STATISTICS 


Similarly the standard error and the probable error of the 
median or of either quartile may be determined by multiplying 
the standard error by the appropriate constant: 


S.E.Md 

- '- is33 ^ 

P.E.„j- .8,54 - 2 = 

S.E.qi 

- '- 3626 va 

P E “ - ' 9 ' 9 ' Vs 


The standard error and the probable error for the third quartile 
are the same as for the first quartile. 

The standard error of the standard deviation for a distribution 
conforming to the normal curve is: 

S.E.tr = - 4 = 


Substituting the appropriate values from the intelligence test data, 
we have: 



= -325 


Thus, the standard deviation should be written 18.8 d= .325. The 
chances are 2 to 1 that any other sample selected would have a 
standard deviation between 18.475 and 19.125. This formula is 
accurate only for a normal distribution. For a skewed distribution 
the following formula may be used: 


c t? _ _ M4 - M 2 2 


*= .0323, intervals 
= .323, points 

This standard error of the standard deviation differs somewhat 
from the other. That is to be expected, because we have previously 
shown that the present sample of I.Q’s varies considerably from a 
normal distribution. 

In Chapter XI the formulas for computing standard errors of 
regression curves were given and illustrated. Also the probable 
error of a coefficient of correlation was illustrated. 

The standard error of a coefficient of correlation — simple, mul- 
tiple, or partial — is determined by the following formula: 

j -2 

S.E.r = - 



STATISTICAL ANALYSIS 


339 


If the right side of this equation is multiplied by .6745, the result 
is the probable error of the coefficient of correlation. This formula 
is less accurate for distributions which depart widely from normal. 

One other measure of standard error is important, and that is 
the measure of the significance of variability between two rates, 
such as per cent, per mille, per hundred thousand, etc. A recent 
paper by Professor Frank A. Ross 3 has emphasized the importance 
of calculating the standard error of rates in ecological studies of 
social phenomena. The formula in general use for computing this 
measure is as follows: 


in which 


R(b - R ) 
N 


<r R = the standard error of a rate 
R = the rate — per cent, per mille, etc. 
b = the base — 100, 1000, etc. 

N = population 

Suppose crime rates have been computed for two census tracts in a 
city, one being near the central business district and the other at 
some distance from this locality. The rates may be based upon a 
relatively small number of cases, and they may differ considerably. 
If conditions remained the same in the two tracts, would we expect 
similar differences in rates to occur in another year? As Professor 
Ross points out, other questions arise here besides that of scarcity 
of data, but the probable variability of the difference between two 
rates, due to number of cases, may be determined from the fol- 
lowing formula: 


If 


then the difference is significant and may be expected to recur 
under similar conditions. If 


the difference is not significant, either because none really exists or 

8 Ross, Frank A., “Ecology and the Statistical Method,” Amer. Jour . Soc. t 
Jan., 1933, pp. 508-517. 



340 SOCIAL STATISTICS 

because the number of cases is too small to be reliable. The final 
formula is 


If the result from the use of this formula is found to be less than 
the observed difference, then the observed difference is probably 
significant and may be expected to persist in the same direction 
under similar conditions in other years. 

In applying the principle of standard error, or probable error, 
the student should not assume that this mechanical test rules out 
all other considerations of adequacy and reliability of the sample. 
These measures are applicable only for reasonably large numbers. 
Mills suggests that if the number of items falls below 15 the for- 
mulas for standard errors should not be applied ; in the case of 
correlation, he raised the minimum to 2 5. Even then the results 
do not warrant great confidence. 4 The application of these for- 
mulas assumes that, if successive samples were taken at random, 
the statistical measures secured would be distributed according to 
the normal law of error, that is, the normal curve. This assump- 
tion holds when the number of items is large and the samples are 
random. Yule warns against an easy assumption that the sample 
is large enough to insure reliability. He says, . . if n is small, 
the rule that a range of three times the standard error includes 
the majority of the fluctuations of simple sampling of either sign 
does not strictly apply, and the ‘probable error 5 becomes of doubt- 
ful significance. 555 The adequacy of the sample must always be 
determined by the investigator. 

The errors referred to above are known as “errors of simple 
sampling, 55 that is, errors due to chance when all precautions have 
been taken to obtain a random sample. But errors in sampling, 
aside from fluctuations due to simple sampling, cannot be accounted 
for by the formulas given. Fluctuations in the sample due to bias 
or inaccurate collection of data are not indicated by measures of 
standard error. These are matters of common sense and careful 
work. 


5 . EXERCISES 

* 

i. Toss 10 pennies 500 times, keep a record of the heads at each 
throw, and compare the results with the expansion of the bi- 
nomial, (/> + y) 10 . 

‘See Mills, of. cit., pp. 559, 560. 

‘Of. cit., p. 353. 



STATISTICAL ANALYSIS 


34 i 


(a) Compare the standard deviation of the experimental data 
with the standard deviation of the theoretical distribution. 

(b) Compare your distribution of heads with that obtained by 
Weldon (see Table LXXX). 

The following table gives the hourly output of 14 women but- 
ton workers in a factory over a period of 4 weeks to 4 months, 
showing the production per hour in intervals of .2 of a pound 
and the frequency of the occurrence of production at each class- 
interval: 


TABLE LXXXVII 


Hourly Production and Frequency of Production in Each 


Interval — Button Workers 1 


Class-Interval in Pounds 

Frequency 

Total. . 


. . . 2,080 


Below 1 . 4 

1 .4- 1 .6 

1 . 6- 1 . 8 

1 .8- 2.0 
2. 0-2. 2 
2. 2-2.4 

2 . 4- 2 . 6 

2 . 6- 2 . 8 

2. 8- 3.0 
30-3. 2 
3 - 2 - 3- 4 
3-4-3-S 

Above 3.6 


23 

35 


245 

319 

35 i 

322 

252 

194 

101 


35 

16 


1 Florence, P. S., The Statistical Method in Economics and Politi- 
cal Science , p. 70. New York: Harcourt, Brace & Co., 1929. 


(a) Determine the mean rate of production and its standard 
deviation. 

(b) Compute the first to fourth moments of this distribution 
and determine the values of 0i and £2. How do these functions 
compare with the corresponding functions of a normal distribu- 
tion? Would you conclude that piece-work rates follow the normal 
law of error? 

(c) Assuming 2,080 items and the standard deviation found, 
compute the values of the ordinates for a normal distribution at 
intervals of .2 of the standard deviation. Make a graph of the 
actual and theoretical distributions. Does the normal curve appear 
to fit closely? 

(d) Assuming 2,080 items, redistribute them for a normal dis- 
tribution, using the table of integrals computed in terms of area. 



342 


SOCIAL STATISTICS 


Make a graph of the actual and the theoretical distribution of the 
2,080 items. Does the normal curve appear to fit closely? 

(e) What is the standard error of sampling? Apply it to two 
or three different class frequencies. 

(f) Apply the Chi-Square test for goodness of fit to the piece- 
work data. For the probability integral of your result consult Ap- 
pendix A, Table CXXIII. 

(g) Determine the standard errors of the mean and the stand- 
ard deviation. What do these errors tell you about the sample? 

3. Let each student find a group of data which seem to conform 
to the normal curve and compute all the statistical measures 
applied to the piece-work data. These may be secondary data 
published in some book, or they may be primary data gathered 
by the student. This exercise should give the student practice in 
estimating the form of a frequency distribution. 

6. REFERENCES 

Keily, T. L., Statistical Method , Chap. V. 

Mills, F. C., Statistical Methods , Chaps. XV, XVI. 

Pearl, Raymond, Medical Biometry and Statistics, Chaps. X-XII. 
Rietz, H. L., Handbook of Mathematical Statistics , Chap. V. 
Weld, L. D., Theory of Errors and Least Squares, Chaps. II-IV. 
Yule, U. G., An Introduction to the Theory of Statistics Chaps. 
XIII-XV. 



CHAPTER XIII 


Time Series 


I. INTRODUCTION 

The most common characteristic of social data is that they vary in 
time. The chronological changes in the quantity and quality of 
social data are especially significant, because we want to know 
whether certain conditions are recurrent and what their general 
tendency of development is. Population facts, marriage, divorce, 
births, deaths, crime, insanity, poverty, and any number of other 
series of social data occur in time. Both private and governmental 
reports of social facts present them as having occurred in certain 
months or years. It is not enough to have the raw data classified 
and put into tables; they must be analyzed to extract the meaning 
that is most significant for an understanding of society and for 
determining social policy. Special methods of analyzing time series 
have been developed, and it is the object of this chapter to describe 
and illustrate the more usual methods. 

Before turning to the technical procedures, however, attention 
may be directed to the logic of time as a category in social statis- 
tics. Social facts change in time, but man has developed ways of 
charting the passage of time. He has set guideposts along the 
route of human history, and he has worked out certain measuring 
sticks which enable him to know how much time has elapsed be- 
tween one social event and another. Some of the measuring sticks 
are based upon astronomical observations. The earth’s relation to 
the sun determines certain physical recurrences, which are the 
effects of the revolution of the earth about the sun and .of the 
declination of the earth’s axis. These physical conditions determine 
seasons, and a little thought will show how large is the number 
of social facts affected by the seasons. The rotation of the earth 
on its axis determines night and day, and many social phenomena 
vary as the result of this fact. In the process of adaptation to his 

343 



344 


SOCIAL STATISTICS 


physical and biological environment man in many ways adapted 
his culture patterns to these astronomical recurrences. Religious 
observances have a definite relation to seasons of the year. Produc- 
tion and consumption habits are notoriously seasonal in their varia- 
tion, or we should not have such widespread efforts to stabilize 
employment. Man devised tools for measuring the length of day 
and year. Weeks and months are different types of temporal units j 
they are purely matters of culture, and their lengths are only 
remotely related to astronomical observations. Of course, all the 
measuring tools for time are matters of culture, but some of them 
divide physical recurrences into definite pieces, such as seconds, 
minutes, hours, and years. Time as duration is not a cause of social 
variation, but physical and social facts undergo change because of 
the interaction of forces in apparently unstable equilibrium in na- 
ture and society. These forces act in time, and it is the resultants 
of their successive actions that the social statistician wants to record 
and analyze. Hence, he adopts the conventional units of time as 
a sort of jointed clothes-line upon which to hang social facts at 
regular intervals. This brings one kind of order into the mass of 
data, and then he can proceed to study the quantitative variations 
occurring at different points along the clothes-line. Things grow 
and endure for a certain piece of time, and then they wear out or 
disintegrate — even human beings 3 the social statistician wants to 
know how big a piece of time is required for forces to develop and 
wear out a human being, a dynasty, a nation, or a culture. 

Observation has shown several different kinds of temporal va- 
riations. One of these is called secular, or long-time, trend. In 
social statistics secular trend is the general direction of growth or 
decline of a series of social phenomena over a period of 10, 25, 
iOO or more years. The trend may be in a straight line, or it may 
be curvilinear. The duration of a “secular trend” is relative to 
what in a human life seems to be a “long time.” It is a practical 
concept. In fact, we do not have data sufficiently complete for any 
social series to describe its absolute secular trend. Even the logistic 
curve, describing the secular trend of population growth, involves 
a speculative analogy between the life of an organism and the life 
of a human race. But for practical purposes we can speak of the 
secular trend of per capita wealth in a nation, the production of 
automobiles, divorce, crime, or employment. Secular trend is meas- 
ured in terms of the average amount of change per month or year 
over a long period of time. On this basis estimates may be made 



STATISTICAL ANALYSIS 


345 

of probable values in succeeding years, though such estimates, 
known as extrapolation, are not reliable if carried far in advance 
of the last actual data. Because the secular trend is not likely to 
show sharp variations within a short period of time, its computa- 
tion is often an aid to social planning. For example, the secular 
trend of the number of children of high school age over a period 
of years would aid school administrators in planning building con- 
struction several years in advance. 

Seasonal fluctuations are another type of temporal change, re- 
curring in wavelike fashion each year. They may be caused by 
something in the physical environment, or they may be due to 
cultural habits or to seasonal fluctuations in some other social series. 
One of the social series best known for its seasonal fluctuations is 
employment. Certain industries, such as building construction, 
seem to be limited by climatic conditions to operating on full time 
during the warm months of the year and on part time during the 
cold months. The packing and canning industries have sharp fluc- 
tuations in the numbers employed because of the fluctuations in 
the flow of livestock and vegetables. But death rates also show 
marked seasonal variations. The attendance at theaters and churches 
has regular ups and downs during the year. Charitable relief goes 
up in the winter and down in the summer. It is important to meas- 
ure the extent of such seasonal fluctuations so that plans may be 
made to meet them as effectively as possible. Efforts at the stabili- 
zation of employment are directed toward eliminating seasonal 
fluctuations in production, and in order to accomplish this it is 
necessary to understand the seasonal fluctuations of all of the 
factors determining seasonal changes in the industry concerned. 

Besides secular trend and seasonal fluctuations, there are cyclical 
variations in social series. These occur at longer intervals than 
seasonal changes but are relatively short as compared with the 
secular trend. The most commonly recognized cyclical variations 
are those shown by business: the booms and the depressions. From 
the peak of one boom to the peak of another may be several years, 
and this period constitutes a cycle. Many social series, such as 
poverty and crime, are correlated with cyclical variations in busi- 
ness. If we think of secular trend as a straight line or a parabola, 
then the cyclical variations represent oscillations above and below 
the trend line. They also are wavelike, but the amplitude of the 
waves is greater than for seasonal fluctuations. Cyclical variations 
are extremely complex in their origin $ they seem to result from 
an intricate interaction of a number of social or economic condi- 



SOCIAL STATISTICS 


346 

tions, over which no control has been achieved. Cyclical unemploy- 
ment is one of the greatest of social problems, but as yet no way 
has been found to reduce its severity. More complete analysis of 
cyclical variations of different social and economic series may lead 
to such an understanding of the problems involved that control 
can be attained. Because of the seriousness to society of cyclical 
variations, it is particularly important for the student to know 
how to measure these changes in time series. 

A fourth kind of temporal variation is known as residual varia- 
tion. This is a term covering a multiplicity of irregular changes in 
social and economic phenomena. A change in some series may be 
due to an earthquake, to storms, to droughts, to a war, or to other 
forces operating at a particular time but not likely to recur at any 
predictable time. The residual changes are what remain after secu- 
lar trend, seasonal fluctuations, and cyclical variations have been 
accounted for. In this volume, however, we are chiefly concerned 
with secular trend and seasonal and cyclical variations. 

2. MEASUREMENT OF SECULAR TREND 

Before proceeding to the computation of the secular trend of a 
series of data, the investigator should decide whether, in order to 
answer his question, allowance should be made for such factors as 
population change, change in the age ratios, or fluctuations in the 
general price level. The secular trend of actual dollars expended 
for the operation of the United States government would be quite 
different from the secular trend of actual expenditures adjusted 
for changes in the general price level. Likewise a phenomenon 
like divorce shows differences, when the gross number is used and 
when divorces are expressed as so many per 1,000 marriages. In 
1889 there were 31,735 divorces in the United States, and in 1928 
the number was I92,342. 1 That is an increase of 606.1 per cent. 
For the same years the divorce rates per 1,000 marriages were 
respectively 60 and 166, or an increase in the rate of only 276.6 
per cent. The secular trend for actual divorces would be much 
more sharply upward than the secular trend for rates per 1,000 
marriages. The investigator must decide which data are best suited 
to his purpose: divorces or divorce rates. If he is interested in the 
absolute increase in divorces, then he would want to know the 
secular trend of the number of divorces granted \ if he is con- 
cerned with the relative increase in the rate of divorce, he would 

1 Reuter, E. B., and Runner, J. R., The Family , p. 21 z. New York: McGraw- 
Hill, 1931. Quoted from Statistical Abstract of the United States , 1929, p. 91. 



STATISTICAL ANALYSIS 


347 

want the secular trend of annual divorce rates. Whenever the 
secular trend of a social series is to be determined, the decision 
must be made as to whether or not interest is in relative or in 
absolute variations. 

Secular trend may be computed in a number of ways. The first 
to be presented is a graphic method, and the data used are divorce 
rates for Indiana from 1899 to 1928: 

TABLE LXXXVIII 

Divorces per 100,000 Population in Indiana, 1899 T0 1928 1 


Year 


Diyprce Rate 


1899. 

1900. 

1901 . 

1902. 

1903. 

1904. 

1905. 

1906. 

1907. 

1908. 

1909. 

1910. 

1911 . 

1912. 

1913. 

1914. 

1915. 

1916. 

1917. 

1918. 

1919. 

1920. 

1921 . 

1922. 

1923. 

1924. 

1925- 

1926. 

1927. 

1928. 


144 

143 

143 

147 

155 

134 

147 

154 

157 

160 

157 

172 

180 
201 
189 

181 
187 

198 

198 

194 

207 

221 

212 

238 

247 

239 

245 

246 
256 

248 


1 Data partly from Marriage and Divorce , /p.27, and Marriage and 
Divorce , /p.?p, United States Bureau of the Census, and partly pro- 
vided by Professor Charles R. Metzger, of Indiana University. 

An examination of the table shows that the trend of the divorce 
rate is upward, but the statistical problem is to fit a trend line to 
the data. Is the trend linear or curvilinear? It appears to be linear. 
In order to get a picture of the distribution of divorce rates Figure 
LXV was drawn. The solid line connects the tops of the ordinates 
of the divorce rates: 



Y^DIVORCE RATE 


3+8 


SOCIAL STATISTICS 



■■ ■ ■■ DIVORCE RATE TREND 

Figure LXV. — Trend of Divorce Rates in Indiana, 1899-1928 




STATISTICAL ANALYSIS 


349 


The broken line is the line of trend fitted by the method of semi- 
averages. The mean divorce rate for the first 15 years was deter- 
mined, and the circle on the ordinate for 1907 marks it. The mean 
rate for the second 15 years was found and is indicated by the 
circle on the ordinate for 1921. These two semi-averages were 
connected by a straight line, and the line was prolonged in each 
direction, to 1899 and to 1928. The straight line fits the data 
rather well; there is not much question of curvilinearity. The 
trend cuts the 1899 ordinate at 133 and the 1928 ordinate at 243. 
Subtracting 133 from 243, we get no. If no is divided by 30, 
we get 3.7 as the average increase per year in the divorce rate; 
that is, the annual trend value is 3.7. If the trend line were pro- 
jected to 1929, we would add 3.7 to 243 making 246.7. That 

TABLE LXXXIX 

Moving Averages of Divorce Rates 




Four- Year 

Four-Year 

Moving 

Average 

Centered 

Five-Year 

Seven-Year 

Year 

Annual Rates 

Moving 

Moving 

Moving 



Average 

Average 

Average 


Y 

Y' 

Y' 

Y' 

T 

( 1 ) 

(a) 

( 3 ) 

( 4 ) 

(s) 

( 6 ) 


1899 

I 44 

1900 

143 

1901 

143 

1902 

H 7 

1903 

1 55 

J 9°4 

134 

1905 

H 7 

1906 

154 

1907 

157 

1908 

160 

1909 

157 

1910 

172 

1911 

180 

1912 

201 

1913 

189 

1914 

181 

1915 

187 

1916 

198 

1917 

198 

1918 

194 

1919 

207 

1920 

221 

1921 

212 

1922 

238 

1923 

247 

1924 

239 

1925 

245 

1926 

246 

1927 

256 

1928 

248 


144 

147 

I4 i 

146 

148 
148 

155 

157 

162 

167 

178 

186 

188 

190 

189 

191 
194 
199 
205 
209 
220 
230 
234 
242 
244 
247 
249 


146 

146 


146 

144 

H 5 

146 

14 5 

146 

H 7 

147 

148 

148 

149 

151 

152 

150 

152 

156 

1 55 

154 

160 

160 

161 

165 

16 5 

166 

172 

174 

174 

182 

180 

177 

187 

184 

181 

189 

188 

187 

190 

191 

191 

190 

191 

193 

193 

192 

193 

197 

i 97 

198 

202 

204 

202 

207 

206 

210 

215 

214 

217 

225 

225 

223 

232 

231 

230 

238 

236 

235 

243 

243 

240 

246 

247 

246 

248 

247 




350 


SOCIAL STATISTICS 


would be the trend value for 1929 and would be an estimate of 
the probable divorce rate in that year. The method of semi- 
averages is easy to use and requires little arithmetical work, but 
it is less exact than other methods of fitting the trend line. 

Another method is called the method of the moving average. 
This is illustrated in Table LXXXIX (see preceding page). 
Since, in using a moving average to measure the secular trend, one 
is not always sure how many years to use, it is necessary to try 
several intervals. The moving averages for four years, four years 
centered, five years, and seven years are shown. The first average 
for the four-year interval is based upon the first four rates and is 
written halfway between the rate for 1900 and that for 1901. Any 
moving average for an even number of years would fall between 
two years j any moving average for an odd number of years falls 
in the middle of some year. Therefore, in order to make the even- 
year moving averages comparable with the odd-year moving 
averages it is necessary to take a second step and “center” the four- 
year moving average. This is done by adding the first two four- 
year averages and dividing by two, which gives 145.5, but since 
the nearest whole number is used, the centered moving average 
is written 146. The four-year moving average is computed as 
follows: 


144 4- 143 + 143 + 147 _ 577 _ _ . 

— _ 144 

4 4 

143 + 143 + H7 +156 _ 589 _ T _ 

Or the second, third, etc., averages may be found by a short cut: 
add to each average, such as 144, the =*= difference between the 
number dropped and the number added (divided by 4), thus, 

Second Average = 144 + (156 — 144V4 = 147 

It will be noted that to get the second average the first rate is 
dropped, the other three are retained in the second sum, and a 
new one is added at the end. This is the process by which each 
moving average, of whatever interval, is determined. The four- 
year moving average is centered in the same way but by adding 
only two of the four-year averages. 

The differences between the various moving averages can be 
seen better in a graph, and this will also reveal which average 
seems to fit the data best. To present the averages in graphic form 



STATISTICAL ANALYSIS 


35i 


it will be necessary to drop certain years at the beginning and at 
the end of the period, because we cannot have a seven-year moving 
average nearer the beginning than 1902 nor nearer the end than 
1925. Figure LXVI shows the three moving averages and the 
actual data. 

The three moving averages appear to fit about equally well, 
though the seven-year average is probably the best. Mills has 
shown that the best moving average for a series of data is one 
equal to the length of the cycle, to a multiple of the cycle, or to a 
period greater than the cycle. The cycles for the divorce rates vary 
somewhat, and that makes more difficult a decision as to the 
length of time required for the moving average. The cycle is usu- 
ally, but not always, about five years in length. That moving aver- 
age which reduces the number of cycles to a minimum is the best 
fit. 2 That is to say, the moving average which approaches nearest 
to a straight line and at the same time best fits the data is the one 
to use. For the divorce rates the four-year average shows 2 com- 
pleted cycles and the beginning of a third. The five-year average 
shows about 2^2 cycles. The seven-year average shows 2 cycles, 
while at the same time it fits the data very closely. Hence, we 
conclude that the seven-year moving average is the one to use in 
this case. 

If the trend is curvilinear, a new difficulty arises. The trend of 
a series which is concave upward presents one problem, and the 
trend of a series which is concave downward presents an opposite 
problem. A moving average of a series with upward concavity 
will always exceed the actual trend values, whereas a moving 
average of a series with downward concavity will be smaller than 
the actual trend values. The moving average is not a good method 
of fitting a trend line to non-linear data, but, if it is to be used, 
“the period of the average should be the shortest which will serve 
to average out the cycles; equal, that is, to the average length of 
one cycle.” 3 If the concavity is slight, the errors are naturally less 
than for series showing marked concavity. The flexibility of the 
moving average gives it an advantage as a measure of trend over 
certain other measures, though for some purposes it is not as useful 
as a mathematical curve. 

Some data show a sufficiently definite and consistent trend to 

2 For a demonstration of the moving average of best fit, see Mills, op . cit., 
pp. 260-265. 

3 Op. cit., p. 267. See also pp. 265-267 for demonstration of error in moving 
averages for curvilinear series. 



250 


352 


SOCIAL STATISTICS 



— ANNUAL RATES 4 YEAR AVERAGE CENTERED • • • • 5 YEAR AVERAGE 

7 YEAR AVERAGE 

GURE .CE Fou: fTE: 




STATISTICAL ANALYSIS 


353 

justify fitting to them a mathematical instead of a moving average 
curve. If the same forces operate over a long period of time to 
produce the changes occurring in the series, the trend is likely to 
be of this definite type. Where additional forces enter to affect 
variation during the period, the changes in the series are likely to 
be irregular and will be more adequately represented by a moving 
average than by a mathematical curve. For example, the State of 
South Carolina does not allow divorce on any grounds. If the law 
were amended to permit divorce on one ground, a record of di- 
vorce cases would appear in the state. If somewhat later several 
other grounds were permitted for divorce, the curve would doubt- 
less show a sharp turn upward. It is such irregularities that make 
it inadvisable to fit mathematical curves to some social data, though, 
of course, there are series to which such a curve may be fitted with 
accuracy. 

Trend as indicated by a moving average is obviously empirical; 
it assumes no law of growth. But a mathematical curve is a method 
of stating a law of change. As a matter of fact, mathematical 
curves fitted to social data are also empirical, but they imply 
greater certainty concerning trend than does a moving average. 
Because of this empirical character, the implications of a mathe- 
matical curve should be definitely hedged about with cautions. 
In the present state of the development of the social sciences we 
cannot state laws in the sense that they can be stated in the natural 
sciences. Too many factors are either unknown or cannot be taken 
into account because of their qualitative nature. Nevertheless, 
some of the methods of fitting mathematical curves to social data 
can be illustrated for purposes of experimentation on the part of 
the student. As more reliable data accumulate, approximations to 
laws of change may be discovered and stated with accuracy in 
mathematical terms. 

In order to show the varying degrees of fit in different curves, 
we shall use the divorce data for illustrating the computation of 
mathematical curves. Three mathematical curves will be fitted to 
the divorce data: a straight line, a second degree parabola, and a 
logarithmic curve. Then the curves will be compared with each 
other and with the seven-year moving average. A straight line will 
be fitted to the data first by the method of least squares. The gen- 
eral equation for the line will be Y = a + bX. The problem is 
to compute the values of a and and the method is shown in 
Table XC. 



354 


SOCIAL STATISTICS 
TABLE XC 

Fitting a Straight Line to the Divorce Data 


Year 

Number 
of the 

Year 

X 

Divorce 

Rates 

Y 

A’ 2 

ATY 

Estimated 
Values — 
Trend 

Y * 

1899 

1 

144 

1 

144 

127.6 

1900 

2 

143 

4 

286 

1319 

1901 

3 

143 

9 

429 

136.2 

1902 

4 

147 

1 6 

588 

140.5 

1903 

5 

155 

25 

775 

144.8 

1904 

6 

134 

36 

804 

149. 1 

1905 

7 

147 

49 

1029 

153-4 

1906 

8 

154 

64 

1232 

J 57-7 

1907 

9 

157 

81 

1413 

162.0 

1908 

10 

160 

100 

1600 

166.3 

1909 

11 

157 

121 

1727 

170.6 

1910 

12 

172 

144 

2064 

174-9 

1911 

13 

180 

169 

2340 

179.2 

1912 

14 

201 

196 

2814 

183-5 

1913 

15 

189 

225 

2835 

187.8 

1914 

16 

181 

256 

2896 

192.1 

1915 

17 

187 

289 

3179 

196.4 

1916 

18 

198 

324 

3564 

200.7 

1917 

19 

198 

361 

3762 

205.0 

1918 

20 

194 

400 

3880 

209.3 

1919 

21 

207 

441 

4347 

213.6 

1920 

22 

221 

484 

4862 

217.9 

1921 

23 

212 

529 

4876 

222.2 

1922 

24 

238 

576 

5712 

226.5 

1923 

25 

247 

625 

6175 

230.8 

1924 

26 

239 

676 

6214 

235-1 

1925 

27 

245 

729 

6615 

239-4 

1926 

28 

246 

7«4 

6888 

243-7 

1927 

29 

256 

841 

7424 

248.0 

1928 

30 

248 

900 

7440 

252.3 

Total 

46J 

5700 

9455 

97914 



M x = & = i 
3 ° 


5-5 


57 °° 

My = — = 190.0 

3 ° 

, = 2 XY- nM x M v = 97914 - 30(15. 5) (190.0) = 
2 X 2 - n(M x ) 2 9455 - 30(1 5 - 5 ) 2 ’ J 

a = M v — bM x = 190.0 - 4.3(15.5) = 123-3 
Y = 123.3 + 4-3^ 


The trend line is determined by assuming values of X successively 
from 1 to 30, that is, using the first year of the period, the second 
year, etc., as values of X. The trend values are given in the table. 



STATISTICAL ANALYSIS 355 

As suggested when it was fitted by the method of semi-averages, 
the straight line gives a fairly close fit; the differences between the 
actual and the trend values are not great. 

But it may be that a closer fit could be obtained by the use of 
a second degree parabola. Table XCI shows the method: 

TABLE XCI 

Computation of Parabolic Curve 


Year 

Rate 

X 2 

XU 

U 2 


UY 

Trend 



or 

or 

or 


or 

Values 

X 

Y 

U 

Y* 

A ' 4 

XV 

X 2 Y 

Y' 

(1) 

(2) 

(3) 

(4) 

(s) 

( 6 ) 

• (7) 

( 8 ) 

1 

144 

1 

1 

I 

144 

144 

127.5 

2 

143 

4 

8 

16 

286 

572 

132.0 

3 

143 

9 

27 

81 

429 

1287 

1364 

4 

147 

16 

64 

256 

588 

2352 

I4O.8 

5 

155 

25 

125 

625 

775 

3875 

145.2 

6 

134 

36 

216 

1296 

804 

4824 

149-5 

7 

147 

49 

343 

2401 

1029 

7203 

153-9 

8 

154 

64 

512 

4096 

1232 

9856 

158.2 

9 

157 

81 

729 

9561 

1413 

12717 

162 . 6 

10 

160 

loo 

1000 

10000 

1600 

16000 

166.9 

11 

157 

121 

I 33 1 

14641 

1727 

18997 

171 .2 

12 

172 

144 

1728 

20736 

2064 

24768 

175.5 

13 

180 

169 

2197 

28561 

2340 

30420 

179.8 

H 

201 

196 

2744 

38416 

2814 

39396 

184.1 

15 

189 

225 

3375 

50625 

2835 

42525 

188.3 

16 

181 

256 

4096 

65536 

2896 

46236 

192.8 

17 

187 

289 

4913 

83521 

3179 

54043 

196.9 

18 

198 

3 2 4 

5832 

104976 

3564 

64142 

201 . 1 

19 

198 

361 

6859 _ 

I 303 2 I 

3762 

71478 

205.3 

20 

194 

400 

8000 

160OOO 

3880 

77600 

209.5 

21 

207 

44 1 

9261 

194481 

4347 

91287 

214.7 

22 

221 

484 

10648 

234256 

4862 

106964 

217-3 

23 

212 

529 

12167 

27984I 

4876 

112148 

222.1 

24 

238 

576 

13824 

331776 

5712 

137088 

226.2 

25 

247 

625 

15625 

390625 

6175 

154375 

230.3 

26 

239 

676 

17576 

'456976 

6214 

161 564 

234-4 

27 

24 5 

729 

19683 

53 H 4 I 

6615 

178605 

238.6 

28 

246 

784 

21952 

614656 

6888 

182864 

242.7 

29 

256 

841 

24389 

707281 

7424 

215296 

246.8 

30 

248 

900 

27000 

8IOOOO 

7440 

223200 

250.9 

465 

5700 

9455 

216225 

5276999 

97914 

2091826 



M x = 

15-5 

My 

II 

'g 

b 

Mu 

= 315-2 



The general form of the curve is Y = a + bX + cX 2 , and the 
normal equations to be solved to determine the values of the con- 
stants are: 


(2x 2 )b + (2xu)c = 2xy 
(2xu)b + (S« 2 )r = Say 



SOCIAL STATISTICS 


356 

The terms in this equation are determined in the following 
manner: 

Lx 2 = LX 2 — n(M x ) 2 = 9455 — 7207 = 2248 
Lxu = LXU — nM x M u = 216225 — 146568 = 69657 
Lxy = 2XY — nM x M y = 97914 - 88350 = 9564 
Lu 2 = LU 2 — n(M u ) 2 = 5276999 — 2980531 = 2296468 
= 2 C 7 Y — nMuMy = 2091826 — 1796640 = 295186 

Substituting these values in the normal equations, we have: 

(I) 2248^ + 69657 c = 9564 

(II) 69657^+ 2296468c = 295186 

These equations must now be solved simultaneously. The Doo- 
little method will be used. Equation (I) will be divided through 
by the coefficient of b in the first equation with the sign changed, 
and then it will be set down with the derived equation (I') be- 
low it: 

(I) 2248^ + 69657 c = 9564 
(I') -b - 30.99 = -4.25 

Equation (II) is then set down, and under it is written equation 
(I') which has been multiplied by the coefficient of c in equa- 
tion (I): 

(II) 69657^+ 2296468c = 295186 
Adding, (69657) (I') -69657 b - 2158670c = -296042 

137798c = - 856 
c = —.006 

Substituting this value of c in either equation (I) or equation (II), 
we find the value of b: 

b = 4.44 

With these values known we can now determine the value of a 
by substituting the appropriate values in the following equation: 

a = M — bM y — cM u 

= 190.0 - 4.44(15.5) - (-.006) (315*2) 

* 123.1 

The equation of the curve can now be stated: 

y = 1213.1 + 4.44^ — .006^ 

The trend values in column (8) of Table XCI were determined 
by successively substituting values of X from 1 to 30. Since these 
values do not vary widely from the original data, it is possible to 
use this line of trend. But before a comparison is made between 



STATISTICAL ANALYSIS 357 

the various lines of trend computed, we shall fit one more curve 
to the data, a logarithmic curve: 

Log Y = a + b logAf 

Table XCII shows the method: 

TABLE XCII 

Computation of Logarithmic Curve 


Year 

X 

(0 

Rate 

Y 

(2) 

Logarithm 
of Af 

X 

( 3 ) 

Logarithm 
of Y 

Y 

(4) 

AT 

( 5 ) 

T 2 

-( 6 ) 

Trend 

Values 

Y' 

( 7 ) 

1 

144 

.OOOO 

2.1584 

.OOOO 

.OOOO 

hi. 3 

1 

143 

.3010 

2.1553 

.6487 

.0906 

128.5 

3 

143 

•4771 

2-1553 

I .0283 

.2276 

139-7 

4 

147 

.602I 

2.1673 

1.3049 

•3625 

148.3 

5 

155 

.699O 

2.1903 

I. 5310 

.4686 

155.3 

6 

134 

.7782 

2. 1271 

1.6553 

.6056 

161.3 

7 

147 

.8451 

2.1673 

1.8316 

.7142 

166.5 

8 

154 

.9031 

2.1875 

1-9755 

.8156 

171 .2 

9 

*57 

•9542 

2.1959 

2.0953 

.9105 

175.4 

10 

160 

I .OOOO 

2.2041 

2 . 2041 

I .OOOO 

179.2 

11 

157 

I .O4I4 

2.1959 

2.2868 

I .0845 

182.8 

12 

172 

I .0792 

1-2355 

2.4126 

1.1647 

186.1 

13 

180 

I. 1139 

2-2553 

2.5122 

I . 2408 

189.3 

H 

201 

I . I46I 

2.3032 

2.6397 

1.3135 

192. 1 

15 

189 

I . I76l 

2.2765 

2.6774 

13832 

194.9 

16 

181 

I . 2041 

2.2577 

2.7185 

1 -4499 

197-5 

17 

187 

1.2304 

2.2718 

2.7952 

1 5139 

200.0 

18 

198 

i-2 553 

2.2967 

2.8830 

1-5758 

202.4 

19 

198 

1.2788 

2.2967 

2.9370 

16353 

204.7 

20 

194 

1 .3010 

2.2878 

2.9764 

1 . 6926 

206.8 

21 

207 

1 .3222 

2.3160 

3.0622 

1.7482 

208.9 

22 

221 

1-3424 

2-3444 

3 -I 47 I 

I . 8020 

211 .0 

23 

212 

1.3617 

2.3263 

3.1677 

1.8542 

212.9 

24 

238 

1 .3802 

2.3766 

3.2802 

I .9050 

214.8 

25 

247 

1-3979 

2.3927 

3.3448 

1.9541 

216.6 

26 

239 

1. 41 50 

2.3784 

3-3654 

2.0022 

218.4 

27 

245 

1. 43 H 

2.3892 

3-4199 

2.O489 

220.1 

28 

246 

1.4472 

2 . 3909 

3.4601 

2.0944 

221 .8 

29 

256 

1 .4624 

2.4082 

3.5218 

2.I386 

223.4 

30 

248 

I - 4771 

2.3945 

3.5369 

2.l8l8 

225.0 



32*4236 

68 . 1028 

74.4196 

38.9788 



Mi - 1.0808 Mi = 2.2701 

The values of a and b in the general formula may now be deter- 
mined in the following manner: 

Aj = 74.4196 - 73-6oS7 = 

> 2 38.9788 - 35-°439 

a = M-jj— bM- x = 2.2701 - .2235 = 2.0466 



358 SOCIAL STATISTICS 

The equation for the line of trend will then be: 

Log Y = 2.0466 + .2068 log X 

Substituting successively the values of the logarithms of X, we 
determine the logarithms of Y, that is, the logarithms of the trend 
values. These values may then be looked up in a table of loga : 
rithms and the trend values in natural numbers determined. That 
has been done in column (7) of Table XCII. These trend values 
are obviously not a good fit. In the middle of the period they are 
considerably higher than the original data, and at each end they 
are much smaller. 

We are now ready to compare the differences in the trend values 


TABLE XCIII 

Comparison of Trend Values Derived by a 7-Year Moving Average, a Straight 
Line, a Second Degree Parabola, and a Logarithmic Curve 


Year 

Di- 

vorce 

7-Year Mov- 
ing Average 

Y- 

ci -f- bX 

Y=a + bX+CX* 

Log y = 
a + b log X 


Rate 










Y 

Y' 

Y-Y' 

Y f 

Y-Y' 

Y' 

Y- Y' 

Y'- 

Y-Y' 

1899 

144 



127.6 

16.4 

127.5 

16.5 

hi . 7 

32.7 

1900 

143 



1319 


132.0 

11 .0 

128.5 

14-5 

1901 

143 



136.2 

6 . 8 

136.4 

6.6 

139-7 

3-3 

1902 

i 47 

145 

2 

140.5 

6-5 

140. 8 

6.2 

148 .3 

- 1-3 

1903 

i 55 

1 46 

9 

144.8 

10.2 

145.2 

9.8 

155-3 

- -3 

1904 

134 

148 

-14 

149. 1 

-15. 1 

149-5 

-15.5 

161.3 

- 27-3 

1905 

147 

1 5 1 

- 4 

153-4 

- 6.4 

J 53-9 

- 6.9 

166.5 

- 19.5 

1906 

154 

152 

2 

157-7 

- 3-7 

158.2 

- 4-2 

171 .2 

-17.2 

1907 

i 57 

154 

3 

162.0 

- 5 -o 

162.6 

- 5-6 

175-4 

-I8.4 

1908 

160 

161 

— 1 

166.3 

- 6.3 

166.9 

- 6.9 

179.2 

-19.2 

1909 

157 

166 

- 9 

170.6 

-13.6 

171 .2 

-14.2 

182.8 

-2J.8 

1910 

172 

174 

— 2 

174-9 

- 2.9 

175-5 

- 3-5 

186. 1 

-14. 1 

1911 

180 

177 

3 

179.2 

.8 

179.8 

— .2 

1893 

- 9.3 

1912 

201 

181 

20 

183-5 

17-5 

184.1 

16.9 

192. 1 

8.9 

1913 

189 

187 

2 

187.8 

1 .2 

188.3 

-7 

J 94-9 

- 5.9 

1914 

181 

191 

— 10 

192. 1 

-11 . 1 

192.6 

— 11 .6 

197.5 

-16.5 

1915 

187 

193 

— 6 

196.4 

- 9-4 

196.9 

- 9.9 

200.0 

-13.0 

1916 

198 

193 

5 

200.7 

- 2.7 

201 . 1 

— 3 -i 

202.4 

- 4 4 

1917 

198 

198 

0 

205.0 

- 7.0 

205.3 

~ 7-3 

204.7 

- 6.7 

1918 

194 

202 

- 8 

209 . 3 

-I 5-3 

209.5 

-I 5-5 

206.8 

-12.8 

1919 

207 

210 

~ 3 

213.6 

— 6.6 

214.7 

- 7-7 

208 9 

- 1.9 

1920 

221 

217 

4 

217.9 

2.1 

217-3 

3-7 

211.0 

10. 0 

1921 

212 

223 

— 11 

222.2 

-10.2 

222.1 

— 10. 1 

212.9. 

~ -9 

1922 

238 

230 

8 

*’226. 5 

11 -5 

226.2 

n. 8 

214.8 

23.2 

1923 

247 

235 

12 

230.8 

16.2 

230.3 

16.7 

216.6 

30.6 

1924 

239 

240 

— 1 

235-1 

3-9 

234-4 

4.6 

218.4 

20.4 

1925 

245 

246 

— 1 

239-4 

5-6 

238.6 

6.4 

220. 1 

24.9 

1926 

246 



243-7 

2-3 

242.7 

3-3 

221.8 

24.2 

1927 

256 



248.0 

8.0 

246.8 

9.2 

223.4 

32.6 

1928 

248 



252.3 

- 4-3 

250.9 

- 2.9 

225.0 

23.0 



STATISTICAL ANALYSIS 


359 


found by the seven-year moving average, the straight line, the 
parabola, and the logarithmic curve. Table XCIII gives results. 
The mean deviations from the actual data, disregarding algebraic 
signs, are as follows: 


7-Year Moving Average 5.8 

Straight Line 8.0 

Parabola 8.3 

Logarithmic Curve 15.4 


The mean of the deviations from the moving average is the small- 
est. It should be noted, however, that the mean of the deviations 
from the moving average is based upon 24 instead of 30 years, 
that is, 1902 to 1925. We may say, then, that the moving average 
gives the closest and the logarithmic curve the worst fit. The 
moving average is flexible, and this gives it a general advantage 
over other methods of smoothing time series. Its chief limitation 
lies in the fact that the larger the number of years included in 
the moving average period, the more years at each end of the 
series will be left without any average. Hence, the choice of a 
moving average or some other measure of trend will depend, not 
only upon the closeness of fit, but upon whether it is important 
to the problem to have an average for every year in the period. 

3. MEASUREMENT OF SEASONAL FLUCTUATIONS 

Seasonal fluctuations in social phenomena have been recognized 
by everyone. Besides the theoretical interest in understanding the 
amount of seasonal fluctuation in various types of social phe- 
nomena, there are important practical interests. In social planning 
it is necessary to know when seasonal fluctuations come and how 
great they are so that they may be taken into consideration. For 
example, the marked seasonal changes in the demands made upon 
charitable relief agencies have to be considered in apportioning 
the budget of such agencies in order properly to spread expendi- 
tures throughout the year. Mortality and morbidity also vary with 
seasons, and both private physicians and public health officers need 
to know what the seasonal fluctuations art for different diseases 
and for all diseases taken together. Students of climate in relation 
to human behavior have noted changes in the efficiency of work- 
ers under varying temperature and humidity, both of which have 
mean seasonal fluctuations. Another practical interest in this sub- 



SOCIAL STATISTICS 


360 

ject is the desire to eliminate the seasonal influence on social 
phenomena, when the principal interest is in the cyclical variations. 
Cycles cover longer intervals of time than seasonal recurrences, 
and, if the swing of these longer social changes is to be measured 
accurately, due allowance must be made for the regularly recur- 
ring seasonal variations. Otherwise one might interpret an upswing 
or a downswing of the curve as a cyclical variation, when in fact 
it was only the normal seasonal fluctuation and the direction of the 
cyclical change might be in the opposite direction. This is well 
illustrated by the level of employment. The cycle of employment 
may be going up in the winter months, but it is almost certain 
that the seasonal fluctuation is downward during that period in 
any year. If we are to make allowance for these different types 
of variation, we must have some way of measuring the quantity of 
seasonal change. 

Several such methods have been proposed and used. For pur- 
poses of illustration mortality data for the State of Indiana from 
19 1 1 to 1930 will be used. Mortality rates per 1,000 population 
in the state are published each month. In Table XCIV these 
mortality rates have been changed a little in order to enlarge the 
figures dealt with; this makes variations more obvious. The rates 
per 1000 population were reduced to index numbers by expressing 
each monthly rate in terms of a percentage of the mean mortality 
rate in 19115 that is, 1911 was used as a base year. 

Seasonal indexes for these data might be computed in either of 
the following ways: (1) by taking the mean rate of all January 
rates, of all February rates, etc., after which we would have per- 
centage figures for each month, and the total for the 12 months 
would be 1,2005 (2) by arranging the rates for each month in an 
array, or a multiple frequency table, and taking the mean of the 
2, 4, 6, or more middle rates 5 (3) by Persons’ chain-link-median 
method 5 (4) by Falkner’s method of computing the ratio of the 
original data to the trend values and taking the adjusted median 
monthly values 5 (5) by the method of a twelve-month moving 
average, centered, in connection with the method of median 
monthly values. Methods (1), (2), and (4) will be illustrated. 
Method (3) is reliable and has been used extensively by the Har- 
vard Committee on Economic Research, but it does not seem to 
have any advantage over methods (2) and (4), and it requires 
much more arithmetical work. Method (5) is easy to understand 
and to compute, but the same practical objection may be raised 



TABLE XCIV 

Mortality Rates in Indiana, 1911-1930, Expressed as Percentages of the Mean Monthly Rate in 1911 


STATISTICAL ANALYSIS 


361 


<5 

hM n O00 mvO n O'O O vnoo 00 vr>v© oo 

Q 

r^oo w\o >t ts vnoo vo vovo n oo n Oh q 0 vr^v© 

0 0 o\ovooor^o.o os 0 ^0 0 0 0 00 os 


<N moo OS^fOd r- t"- T <H r~-C© f- V/V© T^OO© 0 

0 

£ 

•o Ts© h O Ofi nn + OS 00 vo 

Os Os Os OS Os Os Os vnoo Os Ooo oo Os Os Os Osoo oo OO 


r- n OO H r^VO 0000 ro CS O vrs T irsv 0 0 n 00 00 0 

6 

CO Os VO vr>oo r-00 vr ci r-v© wsQ Os r^-VC ws n OO OO 

Os OS Os Osoo Os Os osoo 00 Osoo Osoo Os Os Os Osoo 00 

4-3 

moo n hh f-oo f) h foc^o r-VO vs hh r-00 oo Os m 

3 h 

<u 

CO 

Q 0 vn rf n cor^r^noo 0 o^ csnnoooo r^vo 

00 0 Os os Os 0 O 00 r-~oo 00 00 00 00 00 osoo 00 00 00 


os 0 O'-'sC'-o nnt^ hh i"- 0 vo n t t n 0 t- 0 

< 

<N Q CO CO T Q oo Tj-00 Ov n OS OS H Q O r -"00 Too 

Os 0 O Osoo 0 0 Os r- t"* Os r-oo os os osoo 00 00 00 


T v»oo cow-sn 0 00 hh hh n V vsh-doo r- vr> i-" 

"3 

t — 4 

n 0 "i- ^ mvo r' t-- n + m h- h x 0 ^ t ci 

O Os 0 Osoo 0 osoo 00 00 osoo 00 00 Osoo Osoo 00 os 

ft HH HH 

<u 

vo r"-vnM wsvinno v> n n 0 vn n T'O « 0 0 

c 

3 

•— > 

^ + h hh hh vrsrt-ts wsvitswx vsi^ n nh-r^ts 

OO OO 0 OSOO OO OSOO r-OO 00 r-OO OO OO OS OSOO OO Os 


vsh v-ioo t moo n m n t n 0 m 'Too T 

2 ' 

T hh 00 <-* Os so hsd h-rvt h nfMoon T Too n 

OS OS Os 0 OO Os HH 5 00 O Os OS OS 0 OO O Os O OS Os 

Apr. 

vjDHHOnHHtsorOH-oOHH Tnr^nvoovooo 

hh fs 00 so i^hhso hh m n n vrso r- os 0 « 00 r-~ n 
HHHHOHHOHHHHCCjOOOSOSl-OOnOOOSO 

. 

ooooooooOOnn r-00 T mvo n 0 osoo hoo n 

c 3 

2 

mv© H H WSQ (S TVO Osoo Os 0 vo hh nO n OO vo 
HHHH^nnoconTn c *~£ 5 hh £ 2 £ ; >O £00 


0 t r- t vn 0 0 00 0 no n TO'Ooo t T n n 

fa 

n h ro hi <s r-r^-00 n hh s© 0 hh q hh q t n m 

hh„hhhcIhhC 0 Oh-OOOSi-hCO 000 OsOO O 00 


voTnooTHHHHOo osvo n o os t n o cnvo n 

c 

•—* » 

T HH HH T T m O VO COO *-0 HH r- OO OO HH t n nn 

^HHHHOOfnnHHConOHHHHOOsOOOTO 


hh cs m T voso r^oo os o hh n co T vnv© r^oo os O 
HHHHHHHHHHNHHHHHHHCsnnnnnnnnnm 

Os Os Os O' Os Os Os Os O O O O O O O O O O Os o 

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHt 



SOCIAL STATISTICS 


362 

against it as against Persons’ method. The other three methods 
illustrate simple ways of computing seasonal indexes, and (2) and 
(4) are highly reliable. 

A simple method of getting a picture of seasonal fluctuations 
in a series of data is that of the multiple frequency table. The mor- 
tality rates are distributed in this manner in Table XCV. The. 

TABLE XCV 

Multiple Frequency Table of Mortality Rates Showing 
Seasonal Variations 

Month 


Rate 

Over 150 

J 

F 

M 

A 

M 

J 

J 

A 

S 

0 

N 

D 


1 








1 

1 

1 

145 - 150 



1 










140 - 145 

1 












135 - 140 

1 











1 

130 - 135 

1 


111 

1 









125 - 130 


1 

1 










120 - 125 

11 

1 

1111 

11 









115 -120 

11 

1 

111 

in 









110- 115 

llll 

Kill 

11 

1111 

1 








105 -110 

II 

1 

11 

1111 

1 







1111 

100 - 105 

mil 

llll 

in 

1111 

HU 

1 

III 

111 

in 



nun 

95 - 100 

1 

1 


11 

II 


II 

1 

n 

IHIIIII 

11 

nun 

90 -95 


1 


| 

nun 

IHI 

IHI 

nun 

11 

III 

nunu 


85 -90 


1 



1111 

IHIIII 

mi 

1111 

IHIIIII 

mil 

in 


80-85 


1 



! 

llll 

nun 

n 

III 

1 

in 


75-80 




1# 


II 


in 

1 





seasonality of mortality rates appears to be definite: The rates are 
high in the winter months and low in the summer months 5 other 
seasons of the year show rates lying between these extremes, ex- 
cept that March seems to have the highest rates of any month. 




STATISTICAL ANALYSIS 363 


This table may also be used in computing seasonal indexes by 
method (2). 

Method ( 1 ) will be illustrated first. The means of the rates for 
the 20 Januaries, 20 Februaries, etc., are given in Table XCVI, 
along with the adjusted indexes of seasonal variation: 


TABLE XCVI 


Computation of Seasonal Indexes for the Mortality Data by Method (i) 


Month 

Sum of 
Each 
20-M0. 

Monthly 
Average 
of ' 
Rates 

( 3 ) 

Monthly 
Averages 
Adjusted — 

Monthly 

Variations 

from 

(1) 

Group 
of Rates 

(2) 

Seasonal 

Index 

(4) 

100 

Column (3) 
( 5 ) 


January 2266.9 112.3 112.4 12.4 

February 2235.4 111.2 in. 2 11.1 

March 2378.9 118.3 118.3 18.3 

April 2161.5 107.5 107.5 7-5 

May 1 943 .1 96.7 96.8 — 3 . 2 

June 1738.5 86.5 86.6 *-13 .4 

July 1811.3 90.1 90.1 — 9.9 

August 1810.2 90.1 90.1 —9.9 

September 1801.2 89.6 89.6 —10.4 

October 1954.5 97.2 97.2 - 2.8 

November 1890.5 94.0 94.0 — 6.0 

December 2130.6 106.0 106.2 6.2 


Total 1199 .5 1200.0 

Mean 99.9 100.0 


The sum of the monthly mean rates is 1,199.5. A seasonal index 
is more convenient to use, if the sum is even 1,200.0. If each of 
the mean monthly rates is divided by the mean of all monthly 
rates, that is, 99.9, the adjusted averages are those given in col- 
umn (4), and the total is 1,200.0, and the mean of the monthly 
indexes is 100.0. The relative values of the monthly indexes have 
not been changed by the adjustment. In column (5) the seasonal 
variations from the monthly average of 100.0 are given. It is 
quite clear that marked seasonal variations in death rates do occur. 
They range from 13.4 below to 18.3 above the monthly average. 

Although method (1) is the simplest method for computing a 
seasonal index, it has one important weakness. It is characteristic 
of the mean to be affected unduly by the extreme variations, and 
the simple mean has been used to derive this index. Consequently, 
we should expect this index to exaggerate the seasonal variations. 
It may be used as a rough measure, but it is not as precise as a 
seasonal index may be made. 



SOCIAL STATISTICS 


364 

One other correction needs to be applied to the seasonal index, 
and that is the correction for secular trend. For example, mor- 
tality rates in Indiana have been declining during this .20-year 
period. The mean monthly decline should be added to the sea- 
sonal index to make this adjustment, because the secular trend is 
downward j if it were upward, the secular trend would be sub- 
tracted. Table XCVII gives the corrected monthly averages. The 
secular trend of the mortality rates is nonlinear ; a second degree 
parabola fitted to the data gives a fairly close fit, of which the 
equation is: 

Y = 102.5 + .292X - -03SSX 2 

The standard error of estimate is 5.99. When the annual trend 
values are computed by this formula, the mean annual decrease 
in the mortality rate is found to be .435, and the mean monthly 
decrease is .036. In order to correct for this amount of trend, using 
January as a base, .04 must be added to the February mortality 
rate, .08 to the March rate, .12 to the April rate, etc. When these 
additions have been made, the 1 2 monthly rates are added and an 
adjustment is made so that the indexes of seasonal variation 
equal 1,200. 


TABLE XCVII 

Monthly Averages of Mortality Indexes Corrected for Secular Trend 


Monthly Monthly Corrected Monthly Monthly 

Average of Averages Averages Adjusted Variations 

Mortality Corrected to Equal 1200 — from 100, 

Indexes for Trend Seasonal Index Column (3) 


(1) « (3) (4) (5) 


January 112.3 112.3 112.0 12.0 

February hi. 2 hi. 7 in. 4 11.4 

March 118.3 119. o 118.7 18.7 

April 107.5 107.6 107.3 7-3 

May 96.8 97.0 96.8 — 3.2 

June 86.6 86.0 85.8 —14.2 

July 90.1 90.3 90.1 - 9.9 

August 90.1 90.4 90.2 — 9.8 

September 89.6 89.9 89.7 —10.3 

October 97.2 97.6 97.4 — 2.6 

November 94.0 94.4 94.2 — 5.8 

December 106.2 106.6 106.4 4.6 


Total 1202.8 1200.0 

Mean 100.2 100.0 


There is an error in the final seasonal indexes as given in column 
(4) of Table XCVII due to the fact that the trend is non-linear. 




STATISTICAL ANALYSIS 365 

The correction for trend is the average monthly decrease in mor- 
tality rates, but this involves an assumption of regular monthly 
decrements which would necessitate linearity of trend. Actually 
there is a slight acceleration in the rate of decrease. Hence, the 
correction to be added to December, when January is used as a 
base, is not exactly 11 times the correction added to February, 
but a little more than that because of the parabolic nature of the 
trend. In a similar manner the corrections for the other months 
are slightly erroneous. 

These deviations from the monthly average for the mean year 
are of great value in estimating the actual relative importance of 
a mortality rate in any particular month. They show that every 
year there are normally months with rates higher than average 
and other months with rates normally lower than average. These 
variations reflect neither irregular causes of death nor the secular 
trend. They do show the regularly recurring variations during 
any year. Allowing for the possibility of unusual deviations, they 
indicate what is normally to be expected each month. 

Method (2) eliminates the error due to the use of the mean 
monthly rates. It might be called the mean-median method, be- 
cause the 20 Januaries, 20 Februaries, etc., are arranged in an 
array or a multiple frequency table, as Table XCV, and 2 or more 
of the middle values are added and the mean of these values is 
obtained. Hence, the extreme values exercise less influence on 
the final seasonal indexes than they do when method ( 1 ) is used. 
The mean-median method has been used by various writers in 
the computation of more than one kind of seasonal index, but Pro- 
fessor Chapin’s use of it in connection with dependency indexes is 
especially pertinent to the interests of the social statistician. He 
has used it, along with measures of trend and cyclical variations, 
for the purpose of eliminating the seasonal factor in order to 
arrive at a measure of the residual variations in Minneapolis relief 
statistics . 4 Professor Chapin eliminates the seasonal variations 
from his dependency data by subtracting the seasonal factor from 
the original data. After which he removes the trend values, thus 
leaving only the cyclical and residual variations. For practical rea- 
sons he found this order of segregation desirable in his problem. 
Other writers have removed trend values from the data before 

4 See Chapin, F. S. f “Dependency Indexes,” Social Forces, Vol. V., No. 2, 
pp. 215-224. 



366 


SOCIAL STATISTICS 


X 


w 

-J 

PO 


w 

S 


f2 

Q 

5 s 

< 

Pd 


w 

H 

o 


UJ 

Pd 

o 


H 

g 

C 

& 

o 


o 


Q 

Q 


W 

w 

H 



V 3 

ro 1-1 


O 

H 

NO tso c 4 O 
vO I"- ON O 



co vo n ro 


CD 

Q 

Q « r) rf 
OOOO 



on r-~ w-,VsO 


0 

X 

•— 1 f — 5 

On On On On 


0 

O 

CO | — 1— 1 

rt t — ) Wr t 

On On On On 


. 

O 00 00 


cx 

(U 

CA) 

00 00 00 On 
00 00 00 00 


Aug. 

O vO rh '*f 

OO ON O O 

00 00 on On 

C /3 

<u 


i-< 00 CO 


3 

t " r — 00 i—i 

CO OO OO ON 


<u 

v *->\0 i-H rl 


C 

3 

^ r - i '- 

CO CO CO OO 



-+ tJ- 1^00 


C 2 

l>— H 

' r +- ■*+■ ’TVO 

ON ON ON On 


Apr. 

c* 0 r— -tf* 

00 on O 

O O O ~ 



C 4 OO co O 


S 

VO vo On hh 

M HH I-H t -4 



OO n *J- rf- 

OO O i"H I-H 

O ' HH 

* 

c 

rt 

NO On 04 

»VNOO i-H HH 

O O hh ^ 

Position 
of Rate 
in 

Array 

-j 3 -G jC -C 

4-1 •*-» +-» *-» 

ON O hh C 4 


Total 436.9 441. 9 473.3 435.3 380.1 345.4 354.9 358.4 355.1 376.8 371.7 407.0 4736.8 

Means 109.2 no. 5 118.3 108.8 950 86.4 88.7 89.6 88.8 94.2 92.9 101.8 



STATISTICAL ANALYSIS 


3^7 

subtracting the seasonal factor. The order of elimination will 
depend upon the purposes of the investigator, but the net results 
should be substantially the same in both cases. 

Since the mortality rates are for an even number of years, the 
median rate comes between the tenth and the eleventh year. If 
we had an odd number of years, the median rate would fall in the 
middle year of the array. In our illustration the seasonal index is 
determined by the mean-median method. In order to give equal 
weight to the rates on each side of the monthly medians, rates for 
the same number of years on each side of the median should be 
used. One, two, three, or more rates above and below the median 
might be taken. For the purpose of illustration two rates on each 
side of the median are used for obtaining the mean. Table XCVIII 
gives the four middle rates and the mean-median value for each 
month. It should be apparent from the construction of the mul- 
tiple frequency table that the rates in each monthly column, or 
array, do not follow the same order of years. The rates for Janu- 
ary at the middle of the array might be for entirely different 
years from those at the middle of the February array. But that 
does not affect the reliability of the results. The aim is to get a 
representative value for mortality rates for each month of the 
year. See Table XCVIII. (See Table XCV for complete data.) 

TABLE XCIX 


Mean-Median Rates Corrected for Trend, Adjusted Sea- 
sonal Indexes, and Variations from Monthly Average of 100 



Mean-Median 


Variations 

Month 

Rates 

Seasonal 

of 

Corrected for 

Indexes 

Indexes 


Trend 


from 100 (4) 

(0 

(2) 

( 3 ) 

(4) 

January 

109.2 

hi .0 

n .0 

February 

no. 5 

112.2 

12.2 

March 

118.2 

120. 1 

20.1 

April 

108.6 

110.4 

IO.4 

May 

9a. 8 

96.3 

- 3-7 

June 

86.2 

87.6 

-12.4 

July 

88.4 

89 . 8 

— 10.2 

August 

89.2 

90.6 

- 9.4 

September 

88.4 

89 . 8 

— 10.2 

October 

93-8 

95-3 

- 4-7 

November 

93-5 

94.0 

— 0.0 

December 

101.3 

102.9 

2.9 


Total. 

Mean 


1181 .0 
q8 . T 


1200.0 

100.0 



SOCIAL STATISTICS 


368 

It remains to correct the means of the columns for secular trend 
and express these mean rates as seasonal indexes adjusted to equal 
1,200 for the 12 months. Table XCIV shows these computations. 

The computation of seasonal indexes by the method of the ratio 
of the actual rates to the trend rates is quite long compared with 
either of the preceding methods. In the first place, it involves 
computation of the monthly trend values. The annual trend val- 
ues were computed by the parabolic equation given above (see p. 
364). In view of the fact that the curvature is slight for any given 
year, we may for convenience assume that the trend line is straight 
and that a constant rate of decrease in death rates exists during 
the year. For example, the annual trend values, computed from 
the equation, will be centered at the middle of each year, because 
they are based upon 12-month averages. This should be shifted 
back to the middle of the first month of the year: January. The 
difference between this figure for January, 1911, and January, 
1912, is found by subtracting 102.7 (January, 1912) from 102.9 
(January, 1911). The decrease for the year is . 2 . If we carry it 
to one decimal place only, the rate for the first 6 months of the 
year will be assumed to be 102.9 for each month, and for the 
second six months it will be 102.8 for each month. For the first 
6 months in 1912 the rate will be 102.7. Later years show more 
rapid declines in the rate, and at the end of the 20-year period it 
is declining at the rate of .1 each two months. After these com- 
putations are made, the ratio of each monthly actual value to 
each monthly trend value is computed and expressed as a percent- 
age. When this has been done, the 20 Januaries, 20 Februaries, 
etc., are arranged in an array, and the mean of the middle four 
items is taken. This mean, when adjusted so that the 12 monthly 
means equal 1,200, is the seasonal index. This last step is clearly 
a mean-median method, but it has been applied after the trend has 
been removed by a more accurate method than was used in the 
other two illustrations. Table C gives the monthly mean-medians 
and the adjusted indexes. 5 

It will now be of intgrest to put the three types of seasonal in- 
dexes into one table, where comparisons can be made. In view of the 
fact that that the mean-median and the ratio-to-ordinate methods 

6 This method of computing seasonal indexes is known as the “ratio-to-ordi- 
nate” method and was developed by Dr. Helen D. Falkner. See “The Measure- 
ment of Seasonal Variation,” Journal of the American Statistical Association, 
Vol. 19, pp. 167-179. 



STATISTICAL ANALYSIS 


369 


TABLE C 

Seasonal Indexes Computed by the Ratio-to-Ordinate 
Method 


Month Mean-Medians Adjusted Indexes 


January 109.6 no. 6 

February 108.7 109.6 

March 117.1 118.1 

April 108.0 108.9 

May 96.4 97.2 

June 87.4 88.2 

July 91. 1 91.9 

August 91.7 92.5 

September 90.9 ' 91.7 

October 94.1 94.9 

November 92.6 92.4 

December.... 103. 1 104.0 


Total 1 190. 7 1200.0 

Mean 99 .2 100.0 


are the more accurate, the differences between these are indicated 
in the table. Table Cl gives the three seasonal indexes: 

TABLE Cl 


Three Seasonal Indexes Compared — Corrected for Secular Trend 



Month 

(1) 

Method of 
Simple 
Means 

(2) 

Method of 
Mean- 
Medians 

(3) 

Method of 

Ratio-to- 

Ordinate 

(4) 

(3) - (4) 

(s) 

January . . . . 


112.0 

hi .0 

no . 6 

•4 

February . . . 


hi. 4 

112.2 

109.6 

2 . 6 

March 


118.7 

120. 1 

118. 1 

2.0 

April 


107.3 

110.4 

108.9 

1 -5 

May 


96.8 

96.3 

97.2 

- .9 

June 


85.8 

87.7 

88.2 

- -5 

July 


90.1 

89.8 

91.9 

— 1 . 1 

August 


90.2 

90.6 

92.5 

-1.9 

September. . 


89.7 

89 . 8 

91.7 

-1 .9 

October. . . . 


97-4 

95-3 

94-9 

•4 

November. . 


94-2 

94.0 

92.4 

1 .6 

December. . 



102.9 

104.0 

~i . 1 


Disregarding signs, the mean difference between the mean-median 
and ratio-to-ordinate indexes is 1.3, whereas the mean difference 
between the simple mean and the ratio-to-ordinate method is 1.8, 
and the mean difference between the simple mean and the mean- 
median method is 1.3. These mean differences are fairly close. 
The correction for trend in the case of the mean-median method 
was the mean monthly decrease in the mortality index, which is 



370 


SOCIAL STATISTICS 


shorter than the correction for trend by the ratio-to-ordinate 
method. In view of these facts the mean-median method should 
be used as a time-saver, unless there are special reasons for pre- 
ferring the ratio-to-ordinate method. 

4. MEASUREMENT OF CYCLICAL FLUCTUATIONS 

We are familiar with cyclical fluctuations chiefly through the 
discussion of business cycles which has been going on for some 
fifteen years. When the “business cycle” is mentioned, one im- 
mediately thinks of prosperity and depression in business. While 
more study has been given to cyclical variations by economists 
than by other social scientists, some work has been done on other 
social series. The interest in these latter cyclical variations appears 
to have developed out of the theory that economic conditions are 
correlated with a number of other social factors. The first work 
of this sort done in the United States was by Professor William 
F. Ogburn and Dr. Dorothy S. Thomas , 0 by G. P. Davies , 7 and 
by Miss K. E. Howland . 8 The work by Ogburn and Thomas is 
by far the most comprehensive. It considers the relation of the 
business cycle to marriages, divorces, births, deaths, and crime and 
tests the degree of relationship by means of correlation technique. 
Later Dr. Thomas pursued the study further in both the United 
States and England. In both countries she computed the degrees 
of correlation between the business cycle and marriages, births, 
deaths, pauperism, alcoholism, crime, and emigration. Changes in 
some of the social series lag behind changes in the business cycle, 
and allowance had to be made for this fact . 9 The aim in all of 
these studies was to measure the cyclical variations of social factors 
and then to compute the correlation between each series and the 
business cycle as the independent variable. 

The short-time fluctuations called cycles may be computed for 
any social series varying in time. The term “cycle” implies recur- 
rence. It suggests that variations go up for a while and then go 
down, and that these ups and downs recur with a fair degree of 
regularity. They are variations about the line of trend and are 

a See their article, “The Influence of the Business Cycle on Certain Social 
Conditions,” Journal of the American Statistical Association , September, 1922. 

7 “Social Aspects of the Business Cycle,” Quarterly Journal of the University 
of North Dakota, January, 1922. 

8 “A Statistical Study of Poor Relief in Massachusetts,” Journal of the Amer- 
ican Statistical Association, December, 1922. 

9 Thomas, Dorothy S., Social Aspects of the Business Cycle. London: Rout- 
ledge, 1925. 



STATISTICAL ANALYSIS 


37 i 


measured from that line, whether it be linear or curvilinear. Be- 
fore determining the cyclical fluctuations, the seasonal factor 
should be removed from the data. If the data are annual, instead 
of monthly, the seasonal fluctuations do not appear at all. Under 
such circumstances, it is easy to compute a line of trend and then 
subtract the trend values from the actual values. The remaining 
variations will not be explained entirely by cyclical variations, be- 
cause special causes intervene to produce residual variations. Pro- 
fessor Chapin has shown how these residual factors may be deter- 
mined for dependency data. He removed the seasonal, trend, and 
cyclical factors and then found that some fluctuations still re- 
mained. These were the residuals and represented the effects of a 
multiplicity of minor causes. The residuals, he found, were dis- 
tributed approximately in the form of a normal curve . 10 However, 
the most important variations in social data are due to trend, sea- 
sonal, and cyclical factors. Cyclical variations will be illustrated for 
both annual and monthly data. Table Cl I shows the method of 
computing cyclical variations for the mortality indexes: 

TABLE CII 

Computation of Cyclical Variations for Annual Mortality Indexer Centered 
in the Middle of the Year 


Year 

(0 

1911 

1912 102.3 

1913 104.0 

1914 IOI.O 

1915 

1916 103.7 

1917 

1918 122.7 

1919 

1920 

1921 

1922 

1923 100.4 

1924 

1925 

1926 

1927 

1928 

1929 

1930 


Annual 

Mor- 

tality 

Index 

(2) 

Annual 

Trend 

V alues 

( 3 ) 

Ratio of Index 
to Trend 
Expressed as a 
Percentage 

(4) 

Variations 
from 
100, or 
Cycles 
( 5 ) 

99-9 

102.8 

97.2 

- 2.8 

102.3 

102.9 

99-4 

- .6 

104.0 

103 . 1 

100.9 

•9 

IOI .0 

103 . 1 

98.0 

- 2.0 

983 

103. 1 

95-4 

- 4 6 

103.7 

103.0 

100.7 

•7 

109.0 

102.8 

106.0 

6.0 

122.7 

102.4 

119.8 

19 . 8 

97 0 

102.2 

94-9 

- 5 • 1 

104.6 

ioi .8 

102.8 

2.8 

93 9 

101 .4 

92 . 6 

- 7-4 

92.9 

101 . 1 

91 .9 

- 8.1 

100.4 

100.3 

100. 1 

. 1 

97-4 

99.6 

97.8 

— 2.2 

97.1 

98.9 

98.2 

- 1.8 

IOI .0 

98.1 

102.9 

2.9 

95-7 

97.2 

98.5 

- 1.5 

98 . 1 

963 

01 .9 

i -9 

98.3 

95.2 

103.2 

3-2 

93-2 

94 1 

99.0 

— 1.0 


10 Chapin, F. S., “Dependency Indexes for Minneapolis,” Social Forces , Vol. 
V, No. 2, pp. 220-224. 



372 SOCIAL STATISTICS 

The trend values are removed from the mortality indexes by 
taking the ratios of the original data to the trend values and ex- 
pressing them as percentages. We have already referred to the 
trend values as the expected mortality indexes. We may also speak 
of these trend values as the normal death rate, or normal mor- 
tality index. Then, whatever the trend value is, it is ioo.o per 
cent, and the cyclical variations are determined by subtracting the 
ratio of original-data-to-trend-values from ioo.o. If we take the 
trend as zero, we may express it as a straight line and indicate the 
cyclical variations graphically as follows (see p. 373). 

The cyclical fluctuations above and below the line of trend are 
considerable. The 1918 rise above “normal” is greatest of any year 
and may be explained by the influenza epidemic which swept the 
country, but there is another factor involved. After 1915 the mor- 
tality index was going up. In 1916 it was almost 5 points higher 
than in the preceding year; this was the second year after the 
depression of 19 14-15 began and may be due in part to the after- 
effects of undernourishment and malnutrition during the period of 
depression. A similar change in the fluctuations occurred in 1923, 
about an equal length of time after the depression of 1920-21. 
We have previously noted that the trend in the mortality index, 
though parabolic in form, is downward; that would have to be 
explained by the interaction of a number of factors, such as im- 
provement in medical care, rising economic standard of living, etc. 
The seasonal fluctuations are determined in part by weather con- 
ditions which favor the development of certain diseases and in 
part by other causes, such as lowered income in the winter months. 
While some of the same factors may be operating to determine 
the cyclical fluctuations, it will be seen that they operate in differ- 
ent ways and on different scales of magnitude; the element of 
accidental, or residual, causes enters into the cyclical conditions. 
We get a clearer picture of variations in the mortality index when 
it is analyzed into the three temporal forms. 

Cyclical variations may also be measured by months. If this is 
done, an additional step in the computation is necessary to remove 
the seasonal factor fronf the monthly indexes. The monthly trend 
value must be estimated from the annual trend values, or the trend 
values must be computed on a monthly basis. In the illustration 
the trend values have been estimated from their annual change. 
At the middle of the year 19 11 the trend value was 102.8, and at 



STATISTICAL ANALYSIS 



>M TrE? 






374 


SOCIAL STATISTICS 


the middle of the year 1912 it was 102.9. That is, the rate is 
changing .1 per year; in later years it has changed as much as 1.1. 
If the latter figure had been used, it would have been necessary 
to show several changes in trend within the year. Since the trend 
values for the first two years have been used, the trend value for 
each month is assumed to be the trend value for the year. There is 
an assumption in assigning of the annual trend value to each month 
to which attention should be called: it is that the trend within the 
year is linear. While that is not strictly true, because the annual 
trend values are measured from a parabolic equation, the variation 
from linearity is so slight that it could not be indicated without 
using several decimal places which would suggest greater precision 
and reliability than the mathematical finesse warrants. 

Table CIII presents the computation of cyclical variations by 
months in the mortality indexes for 19 11 and 1912: 

TABLE CIII 

Computation of Cyclical Variations of the Mortality Index by Months 


Month 

Mortality 

Index 

Trend 

V alucs — 
Mortality 

Ratio of 
Index to 
Trend in 

Seasonal 

Index 

Cyclical 
Variations 
( 4 ) - ( 5 ) 



Index 

Percentages 


(1) 

(2) 

( 3 ) 

(4) 

( 5 ) 

( 6 ) 


1911 


January 

... 114.6 

102.8 

in. 5 

112.4 

~ -9 

February 

... 113.0 

102.8 

109.9 

109.3 

.6 

March 

... 113.8 

102.8 

no. 7 

118.6 

- 7-9 

April 

... hi .6 

102.8 

108.6 

108.6 

.0 

May 

... 94.5 

102 . 8 

91.9 

97.0 

- 5 -i 

June 

... 85.6 

102.8 

83.3 

87.0 

- 3-7 

July 

... 102.4 

102.8 

99.6 

91.9 

7-7 

August 

... 92.9 

102.8 

90.4 

92.0 

— 1.6 

September 

... 87.3 

102.8 

84.9 

92.0 

- 7 -i 

October 

••• 93-7 

102.8 

91 .2 

94.8 

- 3.6 

November 

91.2 

102.8 

88.6 

9 3 -a 

- 4 - 6 

December 

... 97.7 

102.8 

92.1 

103.2 

-11 .1 

1912 

January 

. . . hi .4 

102.9 

IO8.3 

112.4 

- 4.1 

February 

. . . hi .4 

102.9 

108.3 

109.3 

— 1 .0 

March 

... 116.8 

102.9 

JI 3-5 

118.6 

- 5-1 

April 

. . . 1 1 2 . 1 

102.9 

109.0 

108.6 

•4 

May 

... 91. 1 

102.9 

88.6 

97 0 

- 8.4 

June 

... 84.7" 

102.9 

82.4 

87.0 

- 4-6 

July 

... 97 -5 

102.9 

94.8 

91.9 

2.9 

August 

... 100.0 

102.9 

97.2 

92.0 


September 

. . . 100.8 

102.9 

98.0 

92.0 

6.0 

October 

... 99.2 

102.9 

96.5 

94.8 

2.3 

November 

••• 94-3 

102.9 

9 i -7 

93-2 

- 1.5 

December 

... 108. 1 

102. Q 

io<;.i 

103 . 2 

1.9 



STATISTICAL ANALYSIS 375 

The monthly cyclical variations for other years would be deter- 
mined in the same manner as those in this table. In Tables Cl I 
and CIII the cyclical variations were measured in units of the 
mortality index. If it is desirable to compare the cyclical variations 
of one social series with those of another, this cannot be done ac- 
curately when these variations are expressed in units of the varia- 
ble. The difficulty can be overcome, . however, by expressing the 
cyclical variations in terms of their respective standard deviations. 
After the computation of cycles this is a simple process. The cy- 
clical variations are squared: the square root of the sum of the 
squares divided by the number of yearsj or months, equals the 
standard deviation of the cyclical variations. Then each cyclical 
variation is divided by the standard deviation. This will be illus- 
trated by the cyclical variations of the mortality indexes and of 
poor relief in Indiana for the same years. Table CIV shows the 
process: 


TABLE CIV 

Transformation of Cyclical Variations in Units of the Variable to Units of 

Standard Deviation 


Mortality Indexes Poor Relief Indexes 


Year 

(1) 

Cycles 

(2) 

Cycles 
Squared 
( 3 ) _ 

Cycles in Units 
of a — (2)4-5.70 
( 4 ) 

Cycles 

( 5 ) 

Cycles 

Squared 

(6) 

Cycles in Units of 

0— (5) +32.45 
(7) 

191 1 

- 2.8 

7.84 

- .49 

- i -5 

2.25 

- .04 

1912 

— . 6 

.36 

— .11 

7-7 

59 . 29 

•24 

1913 

•9 

.81 

.16 

- 36 

12.96 

— .11 

1914 

— 2.0 

4.00 

- -35 

42.3 

1789.29 

1 .30 

1915 

- 4.6 

21 . 16 

- .81 

64.7 

4186.09 

2.00 

1916 

•7 

•49 

.12 

12.2 

148.84 

•37 

1917 

6.0 

36.00 

1 .05 

•5 

•25 

.02 

1918 

19.8 

392.04 

3 46 

-14. 1 

198.81 

- .44 

1919 

- 5 -i 

26.01 

- .89 

- 37-5 

1406.25 

-1. 15 

1920 

2.8 

7.84 

•49 

- 44-7 

1998.09 

-1.38 

1921 

- 7-4 

54-76 

-1.30 

- 5-3 

28.09 

- .16 

1922 

- 8.1 

65.61 

-1.42 

6.8 

46.24 

.21 

1923 

. 1 

.01 

.02 

- 55-2 

3047.04 

-1.76 

1924 

— 2.2 

4.84 

- -38 

— 26.6 

707.56 

- .82 

1925 

- 1.8 

3-2 4 

- .32 

— 25.6 

655-36 

- -79 

1926 

2.9 

8.41 

•51 

-17. 1 

292.41 

- -53 

1927 

- 1.5 

2.25 

— .26 

•3 

.09 

.01 

1928 

1.9 

3.61 

■33 

3-4 

11.56 

. 10 

1929 

3-2 

10.24 

.56 

9-4 

88.36 

.29 

1930 

— 1 .0 

1 .00 

- .18 

79-9 

6384.01 

2.46 


The standard deviations have been computed from the data in 
columns (2) and (5). Columns (4) and (7) give the cyclical 





STATISTICAL ANALYSIS 


3 77 


variations in units of standard deviation. These are seen to be 
much more nearly the same size than the units of the variables. 
To compare the cyclical variations more closely, columns (4) and 
(7) may be plotted. Figure LXVIII presents this comparison. 
The solid horizontal line represents zero deviation from the trend 
lines. 

While there is some similarity between the variations of the two 
series, it is not close. The degree of similarity can be tested by 
the method of correlation. 

5. CORRELATION OF TIME SERIES 

The correlation of time series presents some special problems 
which do not appear when dealing with other types of distribu- 
tions. The trend values and seasonal fluctuations of time series 
should not be treated by the method of correlation. The produc- 
tion of pig iron and the production of potatoes may both have 
upward trends, and a coefficient of correlation between the two 
series would perhaps be large, but it would be without significance 
because there is no reason to expect that these two series are func- 
tionally related. Seasonal fluctuations are related to specific 
conditions which affect particular series each year. If there is inter- 
dependence between two time series, it will be between the cyclical 
fluctuations. Consequently, before using the method of cor- 
relation for the study of time series the trend and the seasonal 
factor should be removed by methods already illustrated. A line 
of trend should be fitted to the data and a seasonal index com- 
puted. Then these variations may be subtracted from the original 
data, and the cyclical variations will be left. The usual methods of 
correlation may then be applied. 

For purposes of illustration it is desirable to have two series of 
data which show marked correlation when the dependent variable 
is lagged, though it may show only slight correlation when the 
two variables are treated synchronously. For this illustration two 
series have been taken from Dr. Dorothy S. Thomas’ study, made 
in England and Wales, of the relation of the business cycle to 
other social series. 11 The series are the business cycles and the 
phthisis, or tuberculosis, death rates which she computed for the 
years 1875 to 1894. Of the four periods studied, this shows the 
closest correlation of phthisis death rates, lagged two years, with 
the business cycle. The cycles in both cases are expressed as per- 
n Thomas, D. S., op. cit., pp. 187, 188, 197. 



378 


SOCIAL STATISTICS 


centage deviations of the annual items from secular trend, ex- 
pressed in terms of standard deviation of each series. It will be 
remembered that the correlation for time series is to be com- 
puted for cycles only; since Dr. Thomas has already computed the 
cyclical variations of her data, it is not necessary here to repeat the 
process of determining these quantities. Table CV gives the first 
steps in computing the degree of correlation between the business 
cycle and the phthisis death rates when taken synchronously : 12 

TABLE CV 

Correlation of Phthisis Death Rates and the Business Cycle, 1875 T0 1894, 
for England and Wales 1 


Year 

Business 
Cycle — 
Deviations 
from Trend 
x 

Phthisis 
Death Rates — 
Deviations 
from Trend 

y 

A' 2 

y 2 

.vx-Products 

—yx 

yx 

1875 

.29 

• 6 7 

.084I 

.4489 


•1943 

1876 

- .11 

■ 3 i 

.0121 

.0961 

.0341 


1877 

~ -34 

.00 

.1156 

.0000 

1 .0807 


1878 

-1.07 

1 .01 

I . I449 

1 .0201 

•5301 


1879 

-1. 71 

• 3 1 

2.924I 

.0961 

■4794 


1880 

•34 

-1. 41 

.II56 

1 .9881 

•7755 


1881 

•55 

-1. 41 

.302? 

1 .9881 

.3910 


1882 

•85 

- .46 

.7225 

.2116 


.8800 

1883 

1 .60 

•55 

2.5600 

.3025 

.0000 

.1224 

1884 

• 3 6 

•34 

. 1296 

.1156 

.2142 


1885 

- .61 

.00 

• 37^1 

.0000 


.6875 

1886 

-1. 19 

.18 

I .4161 

.0324 

.1200 


1887 

- -55 

-1.25 

.3025 

1.5625 

.7790 


1888 

08 

-1.50 

.0064 

2.2500 


1.9686 

1889 

.82 

- -95 

.6724 

.9025 


.6472 

1890 

1 .02 

1 93 

I . O4O4 

3-7249 


.3182 

1891 

.64 

.98 

.4096 

.9604 


.0000 

1892 

- -37 

- .86 

.1369 

•7396 



1893 

-i -57 

.00 

2.4649 

.0000 



1894 

-1.02 

-1.32 

I . O4O4 

1.7424 


13464 


-8.54 

—9. 16 

15-9727 

CO 

CO 

00 

—4.4040 

6. 1646 


6.55 

6.28 






-1.99 


1 Thomas, op. cit pp. 187, 197. 

12 For a comparison of the^degrees of correlation found by using a variety of 
lags for social data, when correlated with an economic index, see Hexter, 
Maurice B., Social Consequences of Business Cycles, especially Chap. VIII 
Boston: Houghton Mifflin Co., 1925. 



STATISTICAL ANALYSIS 


379 


r = 

-*•99 
20 

OIOO 

-2.88 

r y 2 = .0196 

A /i 5.9727 

cr x = A / - ---- — .0100 = 

V 20 

. / 1 8. 1 8 1 8 T 

= y — — .0196 = .943 

6.1646 — 4.4O4O 

— — .OI4O 

20 

r r888) (.943) 

= + .105 

This coefficient is quite low; it suggests that, if the phthisis death 
rate is correlated with changes in the business cycle, the effect is 
not synchronous with the change in the business cycle. When it is 
suspected that the changes in the dependent variable may occur 
later than changes in the independent variable, experiment with 
various lags is indicated. For purposes of illustration here, how- 
ever, only the two-year lag of the phthisis death rate will be used. 
It has been found by Dr. Thomas that significant correlation 
exists between the business cycle and the phthisis death rate, when 
the latter is lagged two years. Table CVI gives the first steps in 
the computation of this coefficient of correlation. 

The business cycles from 1875 to 1892 are used, and the phthisis 
cycles from 1877 to 1894. That is, when we speak of lagging the 
phthisis rate two years, we mean that the business cycle for 1875 
is correlated with the phthisis rate of 1877 and so on throughout 
the 20-year period. The substitution in the formulas is identical 
with the substitutions shown for the data above without lag. 




380 SOCIAL STATISTICS 

TABLE CVI 

Correlation of Phthisis Death Rates and the Business Cycle, 1875 to 1894, 
for England and Wales — Phthisis Death Rates Lagged Two Years 


Year 

D.S'fc- 

.STS?- 

rom ren Lagged Two Years 

* y 

* 2 

y 2 

.^-Products 

—yx yx 

1875 

.29 

.00 

.0841 

.0000 

.0000 


1876 

— .11 

1 .01 

.0121 

1 .0201 

.1111 


1877 

- -34 

• 3 i 

.1156 

.0961 

.1054 


1878 

— 1.07 

-1 .41 

1 .1449 

1 .9881 


1 . 5087 

1879 

-1.71 

-1. 41 

2.9241 

1 .9881 


2.4111 

1880 

•34 

- -46 

.1156 

.2116 

.1564 


1881 

•55 

•55 

.3025 

• 3° 2 5 


■ 30*5 

1882 

.85 

■34 

.7225 

.1156 


. 2890 

1883 

1 .60 

.00 

2.5600 

.0000 


.0000 

1 884 

•36 

.18 

. 1 296 

.0324 


. 0648 

1885 

- .61 

-1.25 

.3721 

1.5625 


.7625 

1886 

-1. 19 

-1.50 

1 .4161 

2.2500 


1.7850 

1887 

- -55 

- -95 

.3025 

.9025 


.5225 

1888 

.08 

1-93 

.0064 

3 - 7 2 49 


•1544 

1889 

.82 

.98 

.6724 

.9604 


.7836 

1890 

1 .02 

- .86 

1 . 0404 

.7396 

.8772 

.7836 

1891 

.64 

.00 

.4096 

.0000 

.0000 


1892 

- -37 

-1.32 

.1369 

1 - 74*4 


.4884 


6.55 

—9 . 16 

12.4674 

17.6368 

— 1 .2501 

9.0725 


- 5*95 5-30 


.60 —3-86 


r — 

9*™ V-* g l - <■*><-«> 

I o 

(.832) (.967) 

= + .548 

This coefficient is moderately high. It suggests that the effects of 
a change in the business cycle upon the phthisis death rate are 
considerable, two years after the change in the business cycle . 13 

Attention should be Called to the fact that the probable errors 
of the two preceding coefficients of correlation have not been com- 
puted. Hitherto we have dealt with the correlation of frequency 

“Both the above coefficients of correlation differ slightly from those Dr. 
Thomas published. The differences are doubtless due to minor variations in 
procedure. 



STATISTICAL ANALYSIS 


38i 


distributions, where random sampling was assumed and where 
there was also assumed to be no relation between individual items 
of a single series. The situation is different in time series. Writing 
on this subject, Professor Warren M. Persons says: “There is a 
special objection to the application of the theory of probability to 
the particular economic data [time series] which constitute our 
material. If the theory of probability is to apply to our data, not 
merely the series but the individual items of the series must be 
a random selection. In fact, a group of successive items with a 
characteristic conformation constitutes our material. Since the in- 
dividual items are not independent, the probable errors of the 
constants [such as coefficients of correlation] of a time series, com- 
puted according to the usual formulas, do not have their usual 
mathematical meaning. . . . Granting as one must that con- 
secutive items of a statistical series are, in fact, related makes in- 
applicable the mathematical theory of probability.” 14 Persons goes 
on to say that actually we do not know what, if any, meaning 
probable errors in time series would have. For that reason it is 
best not to compute them until some satisfactory method of calcu- 
lating the range of variation is found. 

Of course, the question of fitting a line of trend always arises 
in the correlation of cyclical fluctuations. The business cycles used 
here are the averages for several series of economic data used by 
Dr. Thomas: she used third degree parabolas for some of them 
and straight lines for others. The trend line for the phthisis death 
rate is a third degree parabola. Perhaps for beginning students the 
simplest line of trend, and the most flexible, is the moving aver- 
age, unless it is fairly obvious that a straight line or a simple parab- 
ola will fit the data. But an exceedingly good case can be made 
out for the use of the moving average. 15 For preliminary purposes 
a freehand curve may be drawn through the plotted data; this is 
a rough guess at the trend. 

6. EXERCISES 

1. The following table gives the number of active cases carried 
by the Indianapolis Family Welfare Society from 1916 to 
1931: 

14 Persons, W. M., “Some Fundamental Concepts of Statistics,” Jour. Amer. 
Stat. Ass’n, Vol. XIX, New Series No. 145, March, 1924, p. 7. 

15 See Macaulay, Frederick R., The Smoothing of Time Series. New York: 
National Bureau of Economic Research, 1931. 



3^2 


SOCIAL STATISTICS 


TABLE CVII 

Active Cases of the Indianapolis Family Welfare 
Society by Years 


Year 

Cases 

Year 

Cases 

1916 

1028 

1924 

3227 

1917 

1446 

1925 

2638 

1918 

1534 

1926 

3048 

1919 

1474 

1927 

3872 

1920 

1306 

1928 

3690 

1921 

2501 

1929 

3106 

1922 

3605 

1930 

3997 

1923 

2499 

1931 

6169 


(a) Fit a straight line to these data 5 a logarithmic curve ; a 
second degree parabola; a four-year moving average. 

(b) Find the mean deviation of the original data from each 
line of trend. Which shows the smallest mean deviation? 

Fit a curve to the growth of population of the United States. 

TABLE CVIII 

Population of the United States at Each Census, 1790 to 


Year 

Population 

Yeai 

Population 

1790 

3.929.214 

1870 

38,558,371 

1800 

5,308,483 

1880 

50,155,783 

1810 

7,239,881 

1890 

62,947,714 

1820 

9,638,453 

1900 

75,994.575 

1830 

12,866,020 

1910 

91,972,266 

1840 

17,069,453 

1920 

105,710,620 

1850 

23,191,876 

1930 

122,775,046 

i860 

31,443,311 




The following table gives the active case load of the Indianapo- 
lis Family Welfare Society from 1924 to 1931 by months: 

TABLE CIX 


Month 

1924 

1925 

1926 

1927 


1929 

1930 

1931 

January 

• *329 

847 

I 3 H 

'673 

l 4&9 

1528 

2096 

3450 

February. . . . 

• 1323 

846 

1230 

1550 

1459 

1497 

J 99 2 

3627 

March 

. 1088 

850 

1376 

1440 

1368 

1371 

1904 

35 i 8 

April 

760 

74 5 

98I 

1285 

1147 

1099 

I678 

3052 

May 

632 

651 

877 

1086 

983 

931 

1444 

2335 

June 

. 582 

7«9 

853 

1034 

978 

853 

1284 

1682 

Juiy 

542 

718 

788 

983 

875 

885 

Il66 

1508 

August 

548 

694 

749 

985 

823 

84O 

IT 55 

1492 

September. . . 

500 

598 

732 

1041 

868 

845 

1100 

1632 

October 

610 

719 

750 

1013 

9 11 

903 

1301 

2038 

November. . . 

698 

842 

1127 

1163 

1084 

1360 

1903 

2495 

December 

822 

1209 

1537 

1412 

13*8 

1941 

2951 

3274 



STATISTICAL ANALYSIS 383 

(a) Compute seasonal indexes for the Family Welfare Data 
by the three methods discussed in this chapter. 

(b) Compare the three indexes. Which seems best? Why? 
How would you use these indexes in planning a budget 
and employing personnel? 

(c) Compute seasonal indexes of dependency for your own 
city or state. 

4. Cyclical Variations: 

(a) Determine the cyclical variations for the data in Table 
CVII. 

(b) Determine the cyclical variations for the data in Table 
CIX. 

(c) Compare these with some index of general business, for 
which corrections have been made for trend and seasonal 
variations. Are the variations similar? Does one series lag 
behind the other? 

5. Correlation of time series: 

(a) Compute the degree of correlation between the cyclical 
variations found in Exercise 4(a) and the cyclical varia- 
tions in the index of general business which you used. 

(b) Lag the relief case load by one year and compute the de- 
gree of correlation. Is there any significant difference be- 
tween the size or sign of the two coefficients? 

(c) Take two other time series, suggested by the instructor, 
that are believed to be related and compute the degree of 
correlation between the cyclical variations. This will be 
more interesting if the data used are local. 

7. REFERENCES 

Chaddock, R. E., Principles and Methods of Statistics , Chap. 
XIII. 

Chapin, F. S., “Dependency Indexes for Minneapolis,” Social 
Forces } Vol. V, No. 2, pp. 215-224. 

Falkner, H. D., “The Measurement of Seasonal Variation,” Jour . 
Amer. Stat . Ass*n, Vol. XIX, No. 146, pp. 167-179. 

Hall, Lincoln W., “Seasonal Variation as a Relative of Secular 
Trend,” Jour. Amer. Stat. Ass y n y Vol. XIX, No. 146, pp. 
156-166. 

Macaulay, F. R., The Smoothing of Time Series. 

Mills, F. C., Statistical Methods , Chaps. VII, VIII, XI. 

Thomas, D. S., Social Aspects of the Business Cycle , Chaps. I 
and II and Appendix A. 



CHAPTER XIV 


Vital Statistics 


I. THE SCOPE OF VITAL STATISTICS 

In most extant books on statistical methods the subject of vital 
statistics is not given separate treatment, but since the facts are of 
great importance in the study of social problems and since there 
are some specific methods applicable to them, it seems desirable 
in a book of this kind to give this branch of statistical methods 
special consideration. Many of the methods previously discussed 
may be applied to vital statistics, after special methods have been 
used to bring the analysis to a certain point. Average death rates 
or birth rates over a period of time or in different localities may 
be desired j dispersions may be determined, index numbers com- 
puted, and correlations calculated. But in most cases some prelimi- 
nary work should be done on the vital statistics before the appli- 
cation of these methods, and it is with this preliminary analysis 
that this chapter is mainly concerned. 

What kinds of data may be called vital statistics? This question 
has sometimes been answered narrowly for administrative pur- 
poses as statistics of births and deaths ; but it may be answered 
more broadly to include almost any kind of non-social data refer- 
ring to human beings. Sometimes marriages are included, though 
they are social as well as biological matters. Whipple arrives at a 
definition of “vital statistics” through an analysis of the different 
divisions of demography. These divisions, he says, are genealogy, 
human eugenics, the census of population, registration of vital 
facts, vital statistics, biometrics, and pathometrics . 1 Vital statistics, 
according to Whipple, "is the application of the statistical method 
to the study of vital facts, such as birth, marriage, divorce, sick- 
ness, and death. He omits the other divisions of demography. 
Pearl says, somewhat differently, “ ‘Vital statistics/ for which a 

1 Whipple, G. C., Vital Statistics , p. 2. 

384 



STATISTICAL ANALYSIS 385 

better term is biostatistics , is the special branch of biometry which 
concerns itself with the data and laws of human mortality, mor- 
bidity, natality, and demography.” 2 These two definitions of vital 
statistics are not very harmonious, though they were given by two 
of the leading men who concern themselves with the types of data 
mentioned. Pearl regards vital statistics as a special branch of 
biometry but includes demography as a division of it. Whipple 
thinks of biometry as a division of demography coordinate with 
the division of vital statistics. Even if it were possible, it is un- 
necessary for our purposes to have a definition upon which every- 
body would agree. We shall simply take a few kinds of data 
usually regarded as vital statistics and illustrate methods of study- 
ing them. These types of facts are: population growth, marriages, 
births, deaths, and morbidity. Other types of facts which concern 
the social statistician might be included, but there is no doubt of 
the inclusion of any of these five. 

1 . POPULATION GROWTH 

Population in a given geographical area increases because of 
births and immigration, and decreases because of deaths and emi- 
gration. The net result depends upon whether or not there is an 
excess of births and immigrants over deaths and emigrants. We 
are accustomed to expect an increasing population in all the great 
nations, but there are smaller areas in which population has de- 
clined and is declining. The statistician is concerned with both 
the quantity and the quality of changes in the population and 
with the possibility of forecasting future changes. It is much easier 
to measure past changes than it is to estimate changes that will 
take place in the future, but for many purposes it is desirable to 
make estimates with due allowance for a margin of error. The 
basis for estimating changes in population in the United States is 
the decennial census plus certain other data, such as births, deaths, 
immigration, and emigration. A rough way of estimating the popu- 
lation in intercensal years is the arithmetic method without refer- 
ence to births, deaths, etc. For example, the population of con- 
tinental United States in 1920 was 105,710,620 and in 1930 it 
was 122,775,046, which represents an increase of 17,064,426. If 
the population had increased the same amount each year, what 
would have been the population January 1, 1925? Since the time 

2 Pearl, Raymond, Medical Biometry and Statistics, p. 21. Philadelphia: W. B. 
Saunders Co., 1930. 



386 


SOCIAL STATISTICS 


MILLIONS 



Figure LXIX. — Actual Population of the United States, 1870-1920, and 
Projection ok the Curve to 1930 



STATISTICAL ANALYSIS 


387 

between the census in 1920 and the census in 1930 was not 10 
years but 10.25 years, we can divide the decennial increase by 
10.25 to get the annual increase. The mean annual increase would 
be 1,664,822. If we multiply this figure by 5, we get 8,324,110 
as the estimated increase, making a total estimated population of 
114,034,730 for the country. Births, deaths, and migration have 
not been considered. This is the simplest way of estimating the 
population in an intercensal year but, of course, it is open to con- 
siderable error. By the same method we might assume that the 
rate of increase which obtained between 1920 and 1930 continued 
in 1931. We could then add 1,664,822 to' 122,775,046 and get 
124,439,868 for the population in 1931. But the longer this con- 
stant rate of increase is assumed, the larger the error is likely to 
be. Birth rates, death rates, and net increments or decrements 
from migrations change. Consequently, this arithmetic method is 
even approximately valid only for a short period of time, such as 
a decade. Allowance for births, deaths, and migration will be 
discussed below, when Dr. Whelpton’s method of estimating 
population growth is considered. 

Another way of estimating population change by the arithmetic 
method is to plot the census data to the natural scale and project 
the curve for future years. Figure LXIX shows the changing 
population of the United States from 1870 to 1920 and then pro- 
jects the curve to 1930 to illustrate this method of estimation. 
If a freehand projection of the curve of population from 1920 to 
1930 is drawn, the estimated population for 1930 is about 120,- 
000,000, which is more than 2,500,000 less than the census. If 
the same increase in population occurred from 1920 to 193° as 
from 1910 to 1920, and if this amount is added to the census of 
1920, the estimated population in 1930 is 119,448,620, which is 
likely to be a more exact way of making the estimate than is the 
graph though in this case it happens to be less reliable. But in either 
case the error is considerable. 

Where a large population is concerned, the geometric method 
of estimating population increase may be used. This method con- 
siders, not the absolute increase from one decade to another, but 
the percentage change. The formula for determining the rate of 
growth by the geometric method is as follows: 


log (1 4- r) = 


log Pi - log Pp 

AT 



SOCIAL STATISTICS 


'388 


in which r is the annual rate of increase. Pi the population at the 
end of the period, Po the population at the beginning of the 
period, and N is the number of years in the period. If the aim is 
to interpolate the population for intercensal years, the period 
chosen would be the decennium in which the interpolation is to be 
made. On the other hand, if the aim is to extrapolate (estimate 
population in future years) the population, we may use a period 
longer than 10 years so that the effect of long-time trend is more 
pronounced. For purposes of illustration we shall extrapolate the 
population for 1930, using 1870 to 1920 as the base period. 


log (1 + r) = 
log (1 + r) = 


log 105,710,000 — log 38,550,000 
49-5 

8.024116 - 7.586115 
49-5 

.43803 1 


49-5 


1 + r = 
r — 


.008 848 
1.02059 

1.02059 ~ 1 = .02059, or 2.059 P er cent P er year- 
increase. 


The census in 1870 was taken June 1, and in 1920 on January 
1. So the period is 49.5 years. In 1930 the census was taken April 
1. Hence the period from the 1870 census to the 1930 census was 
59.75 years. The estimated population for 1930 would be 130,- 
295,017, or more than 7.5 millions more than it actually was. The 
rate of change had not been constant during this long period. Be- 
tween 1870 and 1890 the rate of increase each year was near 3 
per cent. Between 1910 and 1920 the annual percentage increase 
was about 1.5, and between 1920 and 1930 it was about 1.6. 
Hence, for extrapolation it is better to use the rate obtaining in the 
decade immediately preceding the period for which estimates are 
required. By the geometric method, using the period 1910 to 
1920 as a base period, the estimated population, April 1, 1930, 
would be 122,771,705, which is 3,341 less than the census figure. 
This error is much less than the error involved in a 50-year base 
period. The geometric method may be used graphically also. Fig- 
ure LXX shows the graphic method for the period 1870 to 1930. 
The graphic method is not useful for extrapolating the population, 
but it may be used for interpolating if only round numbers are 



STATISTICAL ANALYSIS 389 

required. Figure LXX shows how the population in 1925 (note 
the broken lines) may be roughly estimated. It would be in the 

MILLIONS 

300 


200 


114 

100 

90 

80 

70 

60 

50 

40 

30 


20 


1880 1890 1900 1910 1920 '25 1930 

Figure LXX. — Growth of the Population of the United States 

neighborhood of 1 14*000,000. If computations are to be based 
upon the estimated population, more exact methods are required. 
Dr. P. K. Whelpton, of the Scripps Foundation for Population 



390 


SOCIAL STATISTICS 


Research, has presented a method of estimating population growth 
which involves only arithmetic, but it requires a great deal of de- 
tailed information not easily accessible to the average investigator 
using population data. 3 Dr. Whelpton bases his estimates upon 
survival rates for various age groups, birth rates of urban and 
rural white and negro considered separately, immigration rates, 
and rural-urban migration rates. He shows that there are good 
reasons for expecting a continuous decline in both the rate of in- 
crease and the numerical increase, and estimates that the popula- 
tion in 1975 will be about 175,120,000. There is a great deal to 
be said for this method of estimating population changes as against 
too much reliance upon more involved mathematical methods 
which make assumptions about the logistic nature of population 
growth. However, it is not a practical method for the student, 
because the special data are not readily available to him. 

Sometimes it is desirable to know the population for a certain 
age, which is not given in the census returns. When the popula- 
tion is given only in age groups, a method of redistributing it 
according to the required age is, therefore, useful. This is done by 
means of a cumulative, or summation, curve. For instance, the 
1930 population of Indianapolis by age groups was as follows: 

TABLE CX 

Census of Indianapolis by Age Groups 1 


Age Group 
in Years 

Number by 
Age Group 

Upper Limit 
of Age Group 

Persons Less Than 
Upper Limit Age 

Linder 1 

5.345 

1 

5,345 

1- 4 

22,304 

5 

22 , 304 

5- 9 

30.274 

10 

57,923 

IO' 14 

27,112 

15 

85.035 

15-17 

1 6 , 094 

18 

101 , 129 

18-19 

1 2 , 204 

20 

1 1 3 , 333 

20-24 

33.155 

25 

146,488 

25-29 

33 , 288 

30 

179,776 

30-34 

31 , 587 

35 

21 1 ,363 

35"44 

58,116 

45 

269,479 

45-54 

44 , 908 

55 

3.4.387 

5s~6 4 

28,761 

65 

343,148 

65-74 

J 4.905 

75 

358,053 


358,905 358,905 


1 Census of Indianapolis by Census Tracts , Indianapolis Census 
Committee, 1931. Table I. 

3 Whelpton, P. K., “Population of the United States, 1925 to 1975,” Amer* 
Jour. Soc., Vol. XXXIV, No. 2, pp. 253-269. 



STATISTICAL ANALYSIS 


Suppose it is desired to know the approximate number of the popu- 
lation who are 26 to 28 years of age. This age group could not be 
obtained from the reports of the census, but it can be estimated 
graphically as follows: 

THOUSANDS 



Figure LXXI. — Cumulative Curve of Indianapolis Population, 1930, and 
Estimation of Population 26 to 28 Years of Age 


The two broken parallel lines cut the curve at 26 and the upper 
limit of 28 years. In round numbers the population under 26 is 
152,000 and under 29 is 173,000. The difference, 21,000, is the 
approximate population 26 to 28 years of age, inclusive. This is 
about as near as the population can be estimated, but it would be 




39 * 


SOCIAL STATISTICS 


satisfactory for some purposes, such as providing a base for com- 
putation of specific rates. 

3. MARRIAGE AND DIVORCE RATES 

Marriage rates may be computed in several ways, but the first 
problem is to define a marriage. A marriage is the union of a man 
and a woman in a given year or at any time whatsoever. Marriage 
rates may be rates of marriage within the year, or they may be the 
rates for all married persons in the population regardless of when 
they were married. A married person for the latter classification 
is anyone living with his or her spouse $ widowed and divorced 
persons are not included. 

The marital status of the population is often an important factor 
in the study of a variety of social problems. If the mean age at 
the time of marriage rises, one of the effects is to reduce the length 
of the childbearing period and, hence, the birth rate. Both the 
rate and the age of marriage vary with racial and national groups 
and with urban and rural populations, and the rate of marriage 
varies according to the ratio of men to women in the population, 
being highest when the ratio is considerably greater than 1.00. 
As the percentage of women gainfully employed increases, the per- 
centage married appears to decrease. Death, crime, insanity, and 
pauper rates seem to be lower for married persons than for others. 
As divorce rates increase, there is a decrease in the social and 
biological significance of marriage. These observations indicate the 
importance of marriage to the work of the social statistician and 
emphasize to the student the importance of knowing how to com- 
pute marriage rates. 4 

The marriage rate in the United States for a given year would 
be the number of marriages consummated per 1,000 population 
over 1 5 years of age at the middle of the year. The rate of total 
marriage in the population at a given year is usually expressed as 
the percentage of the population over 15 years of age which is mar- 
ried. 5 The rates of total marriage may be refined by computing the 
percentage by sex and by age groups. If comparisons are to be made 
between years or decades, or between different geographical areas, 
this refinement is important because a peculiar variation in an age 

*For a comprehensive statistical study of marriage, from which the above 
statements are derived, see Groves, E. R., and Ogburn, W. F., American Mar- 
riage and Family Relationships, especially Chaps. X-XVII, XIX. New York: 
Henry Holt & Co., 1928. 

B Groves and Ogburn, op. cit., Chap. XI. 



STATISTICAL ANALYSIS 


393 


group or in the sex ratio may explain differences in the total mar- 
riage rates. 

Divorce rates are likewise of two kinds: rates for the total num- 
ber of married persons and for the number of marriages consum- 
mated in a given year. In the first instance, the rate is expressed 
as the number of divorced persons in the population per 1,000 
married persons in the population, and, in the second instance, 
divorces are expressed as the ratio of marriages to divorces. The 
use to be made of the rates will determine which kind of rate should 
be used. 


4. BIRTH RATES 

Births are reported as births and stillbirths. Of course, a still- 
birth is a birth, but for purposes of clarity in the statistics it has 
become the custom to report the two kinds separately. For this 
reason it may be assumed, unless known to be otherwise, that a 
published birth rate is concerned with live-births only. 

The “crude birth rate” is the number of live-births per 1,000 
population in the year or month for which the rate is computed. 
This is the usual kind of rate published, though for some purposes 
the “refined birth rate” is preferable. The refined birth rate is the 
number of births per 1,000 women 15 to 44 years of age; it may 
be refined still further by expressing the rate as the number of 
births per 1,000 married women between the ages of 15 and 44. 
The trend in birth rates in the United States from 1919 to 1928 
is shown in the following table: 

TABLE CXI 

Birth Rates, Excluding Stillbirths, in the Registration 
Area of the United States 1 


Year 

Rate per 1,000 
Population 

Year 

Rate per 1,000 
Population 

1919 

22.3 

1924 

22.6 

1920 

23-7 

1925 

21.4 

1921 

24-3 

1926 

20.6 

1922 

22.5 

1927 

20.6 

1923 

22.4 

1928 

19.7 


1 Birth Statistics , United States Census, 1928, p. 4. Rates are 
based upon reports from the official Registration Area which in- 
cludes all states except Nevada, New Mexico, South Dakota, 
and Texas. 

When birth rates are computed by months, they are expressed as 
if the rate for each month were an annual rate. For example, if 



394 


SOCIAL STATISTICS 


the number of births in a city in the month of January is i,000, 
this number is divided by the number of thousands of population, 
or women i$ to 44 years of age, and the result is multiplied by 

which gives a rate in terms of a year. The denominator of 

the fraction is always the number of days in the month for which 
the births are reported. The numerator is 366 in leap years. 

Refined birth rates touch upon another matter which is omitted 
entirely by the crude rate, and that is fecundity. Fecundity is the 
productivity of women in terms of the number of children born, 
or, in another sense, it is the physiological capacity of women to 
conceive. If fecundity is thought of as actual productivity, then 
a rate is obtained by dividing the number of births by the number 
of thousands of women 15 to 44 years of age or by the number 
of married women within those ages. It is the latter to which 
Whipple refers in his discussion of fecundity. 0 Fecundity rates in 
this sense can be refined by computing fecundity by age groups. 
Such calculations are important in estimating population by 
Whelpton’s method. But fecundity in the other sense referred to 
is less easy to measure. What proportion of women are sterile? 
What proportion of men are sterile? The fact that a couple does 
not have any children is not a satisfactory basis for the assumption 
of sterility in one or both married partners. The use of contra- 
ceptive methods accounts for the childlessness of some couples. 
Because of the difficulty of determining physiological fecundity in 
any large number of persons, reliable rates cannot be computed 
at the present time. 

It will be noticed in Table CXI that the birth rate has been 
declining in recent years. That appears to be a general phenome- 
non in all Western countries. A trend line could easily be fitted 
to these rates by methods previously discussed and illustrated. We 
may also ask whether birth rates show cyclical and seasonal varia- 
tions. As Thomas has shown, there are slight cyclical variations 
which are correlated with the business cycle. 7 These may be com- 
puted by the usual method of determining cyclical changes. There 
is little evidence to warrant belief that birth rates have marked 
seasonal variations, at least in the United States. 8 


8 Op. cit., pp. 247-249. 

7 See Thomas, D. S., op. cit., Chap. IV. 

8 See White, R. Clyde, “The Human Pairing Season in America,” Amer . 
Jour . of Soc. t Vol. XXXII, No. 5, pp. 800-805. 



STATISTICAL ANALYSIS 


395 


5. DEATH RATES 

Death rates constitute an important part of vital statistics. There 
are general death rates based upon the number of deaths per 1,000 
population and specific death rates for age groups and particular 
diseases. The latter may be based upon the number of deaths per 
100,000 population or upon the number of deaths per 1,000 per- 
sons in the particular group. 

The general, or crude, death rate is the one with which most 
people are familiar, and yet it has serious limitations for compara- 
tive purposes. It is fairly satisfactory for comparing the death rates 
at different times for the same area, provided the age and sex con- 
stitution of the population remain reasonably constant. On the 
other hand, when comparisons are made between general death 
rates for different areas, it is always an open question whether or 
not the rates are comparable on account of the possibility of im- 
portant differences in age and sex constitution. Table CXI I gives 
the annual general death rates for the registration area of the 
United States from 1919 to 1928, inclusive: 


TABLE CXII 

General Death Rates for the Registration Area of the 
United States, 1919 to 1928 1 


Year 

Rate per 1,000 
Population 

Year 

Rate per 1,000 
Population 

1919 

12.9 

1924 

11. 8 

1920 

13 1 

1925 

11 .8 

1921 

1 r .6 

1 926 

12.2 

1922 

11 .8 

1927 

11. 4 

1923 

12.3 

1928 

12.0 


1 Mortality Statistics, United States Census, 1928. All states 
except Nevada, New Mexico, South Dakota, and Texas are in 
the Registration Area. 


During this period of 10 years the age and sex constitution of the 
population changed some, but the chances are that, if the popula- 
tion were standardized for these two factors, no great change 
would be apparent in the rates. However, it would be inaccurate to 
compare these rates with the general death rates for a particular 
state or with New England. The age and sex constitution for the 
state of Washington would be quite different from that of the 
country as a whole, and it would differ markedly from that of New 
England. Some method must be found to obtain a “corrected death 



SOCIAL STATISTICS 


396 

rate.” This will involve using specific death rates for age groups 
and then combining them into a general rate. 

The principle of the standard million of population must be 
introduced to compute the corrected death rate. The next table, 
from Pearl, gives the distribution of a standard million of popu- 
lation, both sexes together: 

TABLE CXIII 


Standard Million of Actual Living Persons (Both Sexes) in the United 

States, 


Age in Years 

Persons per Million 
in Age Group 

Age in Years 

Persons per Million 
in Age Group 

0- 4 

1 1 , 806 

55- 59 

30,358 

5- 9 

106,321 

60- 64 

24 , 696 

10-14 

99 1 203 

65- 69 

1 8 , 294 

15-19 

98,728 

70- 74 

12,132 

20-24 

98 , 656 

75- 79 

7,269 

25-29 

89, 104 

80— 84 

3,505 

30-34 

75 * 947 

85- 89 

1,338 

35-39 

69,672 

90- 94 

365 

40-44 

57,3H 

9 S - 99 

80 

45-49 

48,682 

IOO-IO4 

39 

50-54 

42,491 




1 Pearl, Raymond, Medical Biometry and Statistics , p. 262. 


The formula used by Pearl to compute the corrected death rate is: 

Z > 

l \ C0 — IOOO _ , 

in which 

Rco = a corrected death rate 

L x = the number of persons of age x in the standard population 
R sx = the specific death rate at age x observed in the particular 
locality for which the corrected rate is being calculated 

Before this formula can be applied, the specific death rates for 
different age groups in the particular locality under consideration 
must be computed. The equation for such specific death rates is : 9 

n D e 

Rg = 1000 — 

E 

in which 

Rg = specific death rate 

D e = deaths in a specified class of the population 
E = number exposed to risk of dying, in the same specified 
class of the population from which the deaths come — 
age, sex, etc., might be basis of exposure 
8 Pearl, op. cit., p. 212. 



STATISTICAL ANALYSIS 


397 


It will be seen that this equation can be used to compute the spe- 
cific death rate for infants, for tuberculous patients, or for puer- 
peral septicemia. We shall make use of it for computing specific 
death rates for age groups as a means of arriving at the corrected 
general death rate. The specific death rates for Indianapolis from 
September i, 1930, to August 31, 1931, are computed in Table 
CXIV: 

TABLE CXIV 


Specific Death Rates in Indianapolis, September i, 
1930, to August' 31, 1931 


Age Group 

(1) 

Deaths 

(2) 

Population 

( 3 ) 

Specific 
Death Rates 

(4) 

O- 4 

709 

27,649 

25.6 

5 “ 9 

91 

30.274 

3-0 

10-14 

47 

27,112 

i -7 

15-19 

81 

28,298 

2.9 

20-24 

132 

33,155 

4.0 

25-29 

145 

33,288 

4-4 

3°~34 

199 

31,587 

6-3 

35-44 

4 23 

58,116 

7-3 

45-54 

646 

44 » 908 

14.4 

55-64 

828 

28,761 

28.8 

65-74 

896 

14,905 

61 . 1 

7 5 and over 

866 

5,683 

152.4 


3.953 

363.736 



These specific death' rates will now be used to compute a corrected 
death rate for the city of Indianapolis. Table CXV gives the com- 
putations required for the equation (see following page). 

The totals in columns (2) and (4) will now be used in the equa- 
tion: 


Rco = 1000 


12607.8 

1000000 


= 12.6 


This figure, 12.6, is the corrected death rate for Indianapolis, 
that is, it is the death rate Indianapolis would have had if the city 
had the same population distribution the country as a whole had 
in 1910. This rate can now be compared with a corrected death 
rate for any other city of the country. 

Attention may be called to the fact that a corrected death rate 
is a weighted average of the local specific death rates. The weights 



398 SOCIAL STATISTICS 

TABLE CXV 

Expected Deaths in Indianapolis, September i, 1930, to 
August 31, 1931 


Age Group 

(1) 

Persons in 
Actual Population 
per Million, 
in Thousands 

(2) 

Specific 
Death Rates 

(3) 

(2) X (3) 

(4) 

0- 4 

1 1 5 . 806 

25.6 

2964 . 6 

5~ 9 

106.321 

3-o 

319.0 

10-14 

99 • 20 3 

i-7 

168.6 

15-19 

98.728 

2.9 

286.3 

20-24 

98 . 656 

4.0 

394-6 

25-29 

89 . 104 

4-4 

392.1 

30-34 

75-947 

6-3 

478.5 

35-44 

1 26 . 986 

7-3 

927.0 

45-54 

9i-'i73 

14. 4 

1312.9 

55^4 

55-054 

28.8 

1585.6 

65-74 

30 . 426 

61 . 1 

I859.O 

75 and over 

1 2 . 596 

152.4 

I9I9.6 


1 2607 . 8 


consist of the proportions of the population in each age group of 
the standard million of population . 10 

Two other kinds of corrections may be made in the computa- 
tion of death rates: Some persons die in a locality who do not 
live there j for example, at a large general hospital which serves 
no definite geographical area. Some persons who live in the com- 
munity die away from the locality. Should consideration be given 
to these facts, or can we assume that as many non-residents will 
die in the city as residents die away from the city? An exact death 
rate would have to take these questions into consideration. It 
might happen that a city had particularly elaborate hospital facili- 
ties and that more non-residents would die in the city and be re- 
ported to the local authorities than the number of residents dying 
away from the city. In the Indianapolis data used above only 
persons who had a residence in the city were used. No check could 
be obtained on those who died away from the city. Consequently, 
both specific and general death rates are lower than they should 
be. For ordinary purposes, it may be assumed that the residents 
dying away from home and the non-residents dying in the city 
are equal $ for more exact calculations, their equality or inequality 
should be determined if possible. 

10 See Pearl, op. cit., pp. 171-174. 



STATISTICAL ANALYSIS 


399 


It is well known that seasonal variations occur in death rates, 
that there are cyclical variations, and that over a long period of 
time a secular trend is perceptible. These measures may be deter- 
mined after the manner described and illustrated in Chapter XIII. 

6. MORBIDITY 

Social statisticians, as well as public health officials, are inter- 
ested in sickness, or morbidity. They would like to know the case 
rates in the population for many particular diseases, but, because 
sickness is so generally regarded as a personal matter, reliable data 
on the prevalence of disease are almost nil. This is not so true 
of what are known as “reportable diseases,” that is, infectious 
diseases which the attending physician is required by law to re- 
port to some central health agency. Even some of the infectious 
diseases, for example, gonorrhea and syphilis, are not reported 
regularly because the physician regards his relation to his patient 
as personal and confidential, and declines to list his private patient 
among those having certain infectious diseases with which a social 
stigma is associated. The United States Public Health Service 
gets weekly reports from American consuls for the following dis- 
eases: cerebrospinal meningitis (epidemic); cholera, Asiatic; 
cholera nostras, cholerine, or gastroenteritis; diphtheria; measles; 
plague, human; plague, rodent; poliomyelitis (acute anterior po- 
liomyelitis or infantile paralysis); scarlet fever; smallpox; tuber- 
culosis; typhoid fever (enteric fever, typhus abdominalis) ; typhus 
fever (typhus exanthematicus) ; and yellow fever . 11 Similar re- 
ports are received from local health officials within the United 
States for chicken pox, diphtheria (carriers not included), influ- 
enza, measles, mumps, pneumonia (all forms), scarlet fever, 
smallpox, tuberculosis (all forms), typhoid fever, whooping 
cough, cerebrospinal fever, dengue, lethargic encephalitis, pel- 
lagra, poliomyelitis (infantile paralysis), rabies (in man) (devel- 
oped cases), rabies (in animals), typhus fever . 12 The non-reportable 
diseases, which are non-infectious or only slightly so, are not re- 
ported to health agencies with sufficient completeness to make the 
data reliable. State and city health departments often try to get 
these diseases reported, but there is no way of determining what 

11 Public Health Reports , United States Public Health Service, February 6, 
1931, p. 285. 

12 Ibid., p. 286. 



400 


SOCIAL STATISTICS 


percentage of the total cases are reported. To obtain adequate 
statistics of morbidity, for both infectious and non-infectious dis- 
eases, is a problem of health organization and of persuasion of 
the medical profession of the public interest at stake in all forms 
of disease. 

Because of the lack of adequate morbidity statistics, the vital 
statistician and the student of social problems are strongly tempted 
to assume that there is a constant ratio between the number of 
deaths from a specific disease and the total number of cases of the 
disease. If this were a fact, the number of cases could be inferred 
from the ratio of mortality to morbidity, but this is unreliable. 
Discussing this question, Pearl says: “Mortality is not and never 
can be a good index of morbidity, generally speaking. What actu- 
ally is done is to weaken and impair the value of the statistics for 
the study of mortality in the hope to make them a little better 
indices of morbidity . ... It is thought desirable to get as com- 
plete records as possible of the frevalence of cancer in the popu- 
lation, as a disease. Therefore, the rule is that, in general, if a 
person dies who is known to have had cancer prior to death, the 
death is charged to cancer. In consequence, it results that no one 
can get from the official statistics an accurate answer to the ques- 
tion: ‘How many persons per 1000 living did cancer kill in 1920?’ 
Instead, what he gets is information as to how many persons died 
per 1000 living in 1920, who had cancer before they died, assum- 
ing that the diagnosis is correctly made in every case. The latter 
information, as anyone with a logical mind will at once perceive, 
is quite different from the former.” 13 

If morbidity rates are to be computed, fully understanding that 
they are open to wide margins of error, they may be crude or 
specific rates. If they are crude rates, then the number of cases 
per 100,000 population is the usual measure for specific diseases. 
Specific case rates would be determined from the number of cases 
per 1,000 persons belonging to the class exposed — e.g., age group, 
sex, etc. Under any circumstances the results warrant only very 
limited confidence. 

0 


7. EXERCISES 

i. The following table gives the population of New York City 
from 1900 to 1930 inclusive: 

13 Op. cit., p. 103. 



STATISTICAL ANALYSIS 


401 


TABLE CXVI 


Population of New York City, 1900 to 1930 


Year 

Population 

Year 

Population 

1900 

3,437,202 

4,766,883 

1920 

5,620,048 

1910 

1930 

6,930,446 


(a) Compute the rate of growth of population in each decen- 
nium and for the 30-year period by the arithmetic method. 

(b) Compute the rate of growth of population in each decen- 
nium and for the 30-year period by the geometric method. 

(c) Compare the results obtained from using the arithmetic 
and geometric methods. 

(d) Estimate the population at each intercensal year between 
1920 and 1930. Using the same basis of estimate, what 
would you expect the population to be in 1940? 

Note: The census was taken on the following dates: 1900, 
June 1; 1910, April 15; 1920, January ij 1930, April 1. 

2. Table CXVII gives the number of persons out of work in Illi- 
nois at the time of the United States Census in 1930: 

TABLE CXVII 


Persons Out of a Job, Able to Work, and Looking for a 
Job, -Class A, Illinois, April, 1930 1 


Age Group 

Number 

Age Group 

Number 

10-14 years 

84 

45-49 years 

21,237 

1 5— 1 9 years 

23 > 205 

50-54 years 

16,758 

20-24 years 

36,447 

55-59 years 

12,548 

25-29 years 

27,808 

60-64 years 

9,186 

30-34 years 

23,490 

65-69 years 

5.494 

35-39 years 

24,678 

70 and over 

2,788 

40-44 years 

23.035 

Unknown 

241 


L Unemployment Bulletin , Illinois , United States Bureau of the 
Census, 1931. 


(a) Determine graphically the approximate number unem- 
ployed who are 26 but less than 29 years of age. 

(b) Determine graphically the approximate number unem- 
ployed who are 46 but less than 48 years of age. 

3. Table CXVIII gives the births by months in Indiana for 1928 
to 1930, inclusive: 



402 SOCIAL STATISTICS 

TABLE CXVIII 

Births in Indiana , 1 1928 to 1930, by Months. Popula- 
tion of Indiana: 1928, 3,176,000; 1929, 3,207,689; 1930, 
3,238,000 


Month 

Births 

Month 

Births 

1928 


1929 


January. . . . 

• 4.962 

July 

• 4,984 

February. . . 

. 4,646 

August 

. 4,801 

March 

• 5.147 

September. . 

• 4,361 

April 

. 4.629 

October. . . . 

• 4,359 

May 

• 4.594 

November. . 

• 4,303 

June 

■ 4.479 

December. . . 

• 4,645 

July 

• 4,770 

1930 


August 

■ 4.825 

January. . . . 

• 4.733 

September. . 

■ 4,572 

February. . . 

• 4,433 

October. . . . 

• 4,575 

March 

• 4,795 

November. . 

. 4.428 

April 

• 4,583 

December. . . 

. 4,560 

May 

• 4,647 

1929 


June 

• 4,544 

January .... 

• 4,552 

July 

• 4,996 

February. . . 

■ 4,445 

August 

• 4,992 

March 

• 5,095 

September. . 

• 4,565 

April 

■ 4,552 

October. . . . 

• 4,446 

May 

■ 4.584 

November. . 

. 4,241 

June 

. 4,506 

December. . . 

. 4,300 


1 Monthly Bulletin , Indiana State Board of Health, Jan- 
uary, 1928, to December, 1930. 


(a) Find the crude birth rates for each month, expressed as 
annual rates, for the above data. 

(b) For a seasonal index of births to be reliable more years 
are required than are given in this table, but this may be 
used for illustrative purposes. Compute the seasonal varia- 
tions in births, if any. 

(c) Plot the crude birth rates. Is there evidence of a cyclical 
decline following in the wake of the depression which 
began in 1929? 

Note: The population data may be computed from the 
census reports. 

4. Table CXIX gives the deaths in the United States from 1914 
to 1928: 



STATISTICAL ANALYSIS 403 

TABLE CXIX 

Deaths from All Causes in the United States, 1914 
to 1928, and the Estimated Population of the Regis- 
tration Area 1 


Year 

Estimated 

Population 

Deaths 

1914 

65.813.315 

898 ,059 

1915 

67,095,681 

909.155 

1916 

71,349,162 

1 ,001 ,921 

1917 

74,984,498 

1 ,068,932 

1918 

81,333,675 

1,471,367 

1919 

85,166,043 

1 ,096,436 

1920 

87,486,713 

1,142,558 

1921 

88,667,602 

1 ,032,009 

1922 

93 , 241 , 643 

1 , 101 ,863 

1923 

96,986,371 

1 , 193,017 

1924 

99,200,298 

1,173,990 

1925 

103,108,000 

1 ,219,019 

1926 

105 ,167,000 

1 , 285,927 

1927 

108,327,000 

1 ,236,949 

1928 

114,495,000 

1,378,675 


1 Mortality Statistics , United States Bureau of the Cen- 
sus, 1928. 


(a) Compute the crude death rate for the United States from 
1914 to 1928. 

(b) Fit a line of trend to these rates. 

(c) Compute the cyclical variations of these death rates. 

5. Table CXX gives the number of deaths in the United States 
in five-year age-intervals for the year 1928: 

TABLE CXX 

Deaths in the United States in PiveA ear Intervals, 

1928, and the Estimated Population in Each Interval 
for the Registration Area 1 


Age Group 


o- 4 

5 ~ 9 
10-14 
15-19 
20-24 
25-29 
30-34 
35-39 * 

40-44 

45-49 

50-54 

55-59 


Estimated 

Population 

Deaths 

12 , 479,955 

216,090 

12,365,460 

25,245 

ii, 5 <> 3.995 

19.494 

10, 190,055 

33,226 

10,075,560 

43.445 

9,846,570 

44,062 

8,701 ,620 

46,454 

8,472,630 

56,754 

6,869,700 

62,218 

6,297,225 

70,759 

5.151,275 

82,319 

3,892,830 

89.367 



404 SOCIAL STATISTICS 


TABLE C XX— (Continued) 


Age Group 

Estimated 

Population 

Deaths 

60-64 

3,205,860 

101,676 

65-69 

2,289,900 

117,229 

70-74 

1,488.435 

118,904 

75-79 

915,960 

107,293 

80-84 

457,980 

78,343 

85-89 

“4,495 

43,173 

90 and over 

57,248 

20,164 

Unknown 

“4,495 

2,460 

Total 

“4,552, 2 4 8 2 

1,378,675 

1 Mortality Statistics , United States Bureau of the Cen- 


sus, 1928. 

2 The total is a little higher than the estimate for the 
whole registration area, as given by the census, because 
the percentages in each age group have been carried to 
only one decimal place, and the percentage for the group 
90 and over is only .05 per cent and is not given in the 
reports of the census. The population for this group has 
been estimated by the author. 

(a) Compute a corrected death rate from the above data. How 
does it compare with the crude rate for 1928? 

6. Obtain the mortality data for your own state, if available, and 

compute: 

(a) The crude death rates from 1919 to 1928. How do they 
compare with the national rates? 

(b) The corrected death rate for 1928. How does it compare 
with the national rate? 

8. REFERENCES 

Newsholme, Sir Arthur, Elements of Vital Statistics , New Edition. 

Pearl, Raymond, Medical Biometry and Statistics y Chaps. VII-IX. 

Thomas, Dorothy S., Social Aspects of the Business Cycle y Chaps. 
III-V. 

Whipple, G. C., Vital Statistics , Chaps. IV-XII. 



CHAPTER XV 


Rating Scales 


I. THE FUNCTION OF RATING SCALES 

The discussion and illustration of rating scales might have been 
given in Chapter V, “Collection and Assembling of Data,” but 
there is a logical difference between the kind of data discussed in 
that chapter and the kind sought by means of a rating scale. The 
data discussed in Chapter V are obtained mainly by a counting 
scheme, whereas a rating scale is intended to show degrees of 
difference in a single variable. The methods of statistical analysis 
described in preceding chapters may be applied to data gathered 
by means of a rating scale, but the theory of the rating scale is 
of sufficient importance to justify treatment in a separate chapter. 

During the past decade increasing emphasis has been placed 
upon measurement in psychology, education, and the social sci- 
ences. Many of the traits to be measured, however, present great 
technical problems for two reasons: first, because we are not in the 
habit of thinking of them in quantitative terms, and, second, be- 
cause the invention of measuring sticks is difficult. However, 
experimentation has given some valuable, and perhaps more hope- 
ful, results. Is a man a pacifist or a militarist, or does he seem to 
occupy a middle-of-the-road position with respect to these two 
popular conceptions? Is it possible to mark off degrees of attitude 
toward war, ranging from complete pacifism to complete mili- 
tarism? Are all blind persons blind in the same degree, or should 
definite degrees of blindness be distinguished? Are there degrees 
of psychoneurotic personality, or is psychoneurotic personality a 
fixed and definite thing like pneumonia? The attitudes or condi- 
tions mentioned are in fact variables. But how are we to measure 
the degrees of variableness? That is the function of a rating scale. 
Scales of pacifism-militarism and of blindness must be devised. The 
scale will be analogous to the division of length into feet and 

405 



406 


SOCIAL STATISTICS 


inches, and variations infinitely small may be indicated. That is, 
the assumption back of the rating scale is that attitudes and social 
conditions are continuous variables. 

It may be objected that no accurate scale, comparable to linear 
measure, can be devised for attitudes and social conditions com- 
monly assumed to be qualitative and, hence, not the proper objects 
of measurement. Until recently this viewpoint was generally ac- 
cepted, but the introduction of statistical concepts into . physics and 
chemistry has tended to change it. It has been seen that successive 
measurements of the same material object do not agree exactly, if 
the unit of measurement is made indefinitely small. These suc- 
cessive measurements tend to distribute themselves in a normal 
frequency curve. Consequently, it is logical to reason that, even if 
an attitude cannot be measured exactly by different people or by 
the same persons at different times, attempts at measurement are 
justified and normal errors are to be expected. We cannot say that 
one science is quantitative and that another is totally lacking in 
this characteristic. The physical, biological, and social sciences 
might themselves be arranged along a rating scale according to 
degrees of precision of measurement attainable in the present state 
of scientific technique. It is quite likely that the standard deviation 
of measurements of an attitude by means of a rating scale would 
be larger than the standard deviation of successive measures of the 
expansion of a piece of steel under specified temperatures, but, if 
the validity of the rating scale can be determined, the results are 
reliable within the range of ascertained error. 

Experimentation with rating scales has proceeded far enough to 
reveal certain tests for reliability and validity. For convenience we 
may classify rating scales into sociometric and attitude scales. The 
sociometric scale is used for measuring aspects of social institutions, 
and the attitude scale is used for measuring the mental set of an 
individual toward a certain type of reaction. (Perhaps a third type 
should be mentioned, such as the test for degree of blindness 
given below, but this type is really a physiometric scale and is 
mentioned in this book only because of the social implications of 
blindness.) The first Is concerned with material culture or physical 
conditions; the second is concerned with the reaction organization 
of an individual which has been built up in a cultural and physical 
environment. 

Six tests for the reliability and validity of a sociometric scale 



STATISTICAL ANALYSIS 


407 


may be distinguished. (1) Reliability may be tested by having 
different observers rate the same subject. The degree of correlation 
between these ratings constitutes a measure of the reliability of the 
scale. Different persons should be able to rate the same subject 
similarly, if the scale is reliable. 1 (2) The scale must have general 
validity. That is, it must not be seriously affected in validity when 
it is applied to different subjects of the same class. If it is a home 
rating scale, it should be applicable to homes in any city or rural 
community. Furthermore, the degree of correlation between the 
results obtained on a given scale and on some other scale that has 
been standardized for the same class of subjects should be high, 
if the given scale is valid. (3) The scale must make possible the 
establishment of reasonable norms for the subjects to which it is 
to be applied. This norm will be statistical and will be represented 
by a curve of distribution which is normal for this class of objects; 
the curve may approach the form of a normal curve of error, or it 
may be skewed. But application to a random sample of the subjects 
should make possible the establishment of a norm. For such a 
distribution measures of central tendency and dispersion may be 
computed. (4) Factors which enter into the construction of the 
scale should be generally available to the investigator. Availability 
implies accessibility to the subjects to be studied and reasonably 
exact definition of terms. If the terms are ambiguous, the relia- 
bility and validity of the scale are likely to be low. (5) Assuming 
the availability of all factors of importance to the problem, the 
scale should take into consideration all significant aspects for which 
quantitative evaluations can be secured. Here the judgment of the 
worker is paramount. He may resort to experiment to determine 
what are the important factors and aspects of factors, or he may 
rely upon his own judgment and that of other qualified persons. 
For example, in constructing a scale for rating the living rooms of 
homes, all the objects which have diagnostic value should find a 
place in the scale. (6) Each factor in the scale should be weighted 
according to its relative significance. If we are trying to measure 
the importance of blindness as a social problem in a state by means 
of an index, it is of first importance to know what weight to attach 
to the blindness of a person who cannot distinguish light from 
darkness and to the blindness of a person who can walk around 

1 Lundberg, G. A., op. cit., pp. 248-252. Quoted from Gould, K. M., A Socio- 
metric Scale for American Cities , pp. 53-57. M. A. Thesis, Columbia University, 
1921. Points (2) to (6) above are taken from this source. 



SOCIAL STATISTICS 


408 

but cannot read. Some standard of significance of factors must be 
set up ; it is similar to the problem of weighting prices to obtain 
a price index. Weighting is an important problem in the establish- 
ment of the validity of a given scale. If different weights are used 
in the given scale and in some other standardized scale with which 
the given scale is to be compared, the results obtained and the 
degree of correlation discovered between results might be low 
because of the difference of weights. Consequently, weights also 
must be tested for validity. 

Somewhat similar tests for reliability and validity may be used 
for attitude scales. Dr. Goodwin B. Watson has used the following 
tests of the validity of results obtained in the use of a test of 
“f; air-mindedness”: 2 (1) Examination of the tests with reference 
to what they seem to be measuring ; (2) correlations between each 
form of the test and the test as a whole ; (3) a study of the scores 
obtained by individuals who are selected by their group as most 
fair-minded ; (4) individuals who are supposed, by those who 
know them well, to have pronounced lines of prejudice are given 
the test, and their reactions compared with those which would be 
anticipated; (5) certain groups who might be supposed to possess 
certain lines of prejudice are studied by the test and the result 
compared with the assumptions of competent judges as to the lines 
of prejudice that might be expected to exist within the given 
groups; (6) the tests are examined to determine to what extent 
they are measures of intelligence or opinion rather than of preju- 
dice. Thurstone employed what he called objective tests of relia- 
bility and validity to his rating scale for attitude toward the 
church. 3 They are: ( 1) the probable error of the scale value, which 
is equal to the product of half the standard deviation of the scale 
values and the standard error; (2) the existence of ambiguous 
statements in the test is determined by the scale-distance, the 
X-value, between the first and third quartiles: if the distance is 
great, and the curve flat, the statement is ambiguous; (3) the 
existence of an irrelevant statement is determined by comparing 
the ratings on other statements of similar character; if the state- 
ment is relevant, the other ratings should be distributed in the 
form of a normal frequency curve. Thurstone’s criteria are wholly 

2 Watson, G. B., The Measurement of Fair-Mindedness, p. 19. Teachers Col- 
lege, Columbia University, 192*5. 

3 Thurstone, L. L., and Chave, E. J., The Measurement of Attitude, pp. 42-56. 
University of Chicago Press, 1929. 



STATISTICAL ANALYSIS 


409 


objective and should be applied to the results obtained from any 
attitude scale used. 

Four types of rating scales will be used for purposes of illustra- 
tion: (1) the scale for blindness developed by the Committee on 
Central Statistics of the Blind; (2) Chapin’s Scale for Rating 
Living Room Equipment; (3) the Matthews revision of Wood- 
worth’s Psychoneurotic Inventory; and (4) Thurstone and 
Chave’s scale for measuring attitudes toward the church. 

2. A BLINDNESS SCALE 

We are accustomed to think of blindness as a unitary term. A 
blind person is simply one who cannot see. But some persons who 
are classified as blind cannot distinguish light from darkness, while 
others are able to walk about unaided but cannot read. Obviously 
there are degrees of blindness. The Committee on Central Statis- 
tics of the Blind has taken the Snellen scale for measuring visual 
perception and has given the following descriptive terms to the 
five degrees of blindness recognized by Snellen: (1) totally blind 
or having “light perception only”; (2) having “motion percep- 
tion” and “form perception”; (3) having “traveling sight”; 
(4) able to read large headlines; (5) “borderline” cases. For each 
of these classes Snellen has exact measurements of the amount of 
visual perception. The scale is reproduced on page 410. 

This scale is interesting for two reasons as an illustration of 
quantitative analysis of a trait: first, because it makes clear the fact 
that blindness is a variable, and, second, because it shows the 
“rough tests for lay workers” in a column parallel to the Snellen 
measurements. To most people a blind person is simply a blind 
person; qualifications to suggest degree of blindness are not made. 
We see here a trait that may be measured exactly by means of the 
Snellen scale. The five divisions are chiefly for the “lay worker.” 
Actually blindness exists in all gradations, however small, from 
total absence of visual perception to the so-called borderline cases. 
It is a continuous variable, and in a list of 1 ,000 blind persons who 
had been tested we should expect to find Snellen measures con- 
tinuous from o to some point arbitrarily chosen as the maximum 
visual perception consistent with the definition of blindness. The 
rough test given for lay workers makes blindness a discontinuous 
variable for no other reason than that rough tests are guesses and 
not measures. The pigeonhole type of social classification is well 



410 SOCIAL STATISTICS 

PROPOSED TABLE FOR UNIFORM GROUPING OF THE BLIND BY 
AMOUNT OF VISUAL PERCEPTION 




Snellen Measurements 1 of Visual Perception 


Group 

Description 
of Group 

At various 
distances 
(feet) 

At a fixed distance 

Rough Tests' for 

Lay Workers 



(20 feet) 

(6 meters) 


1 

Totally blind or 
having “light per- 
ception ” only 2 

0 

Up to but not 
including 

2/200 

0 

Up to but not 
including 

20/2000 

0 

Up to but not 
including 

6/600 

No vision, or light percep- 
tion only 2 

Up to but not including 

Perception of motion of 
hand at a distance of 3 feet 
(arm’s length) or less 

2 

Having “motion 
perception” and 
“form perception” 

2/200 

Up to but not 
including 

5/200 

20/2000 

Up to but not 
including 

20/800 

O/Gco 

Up to but not 
including 

6/2.10 

Ability to perceive motion 
or f orm of hand at a dis- 
tance of 3 feet (Arm’s 
length) or less 

Up to but not including 

Ability to count fingers at 
a distance of 3 feet (arm's 
length) 

3 

Having “travel- 
ing sight” 

5/200 

Up to but not 
including 

10/200 

20/800 

Up to but not 
including 

20/400 

6/240 

Up to but not 
including 

6/120 

Ability to count fingers at 
a distance of 3 feet (arm’s 
length) 

Up to but not including 

Ability to read large letters 
(such as newspaper head- 
lines) 

4 

Able to read large 
headlines 

10/200 

Up to but not 
including 

20/ 200 

20/400 

Up to but not 
including 

20/ 200 

6/120 

Up to but not 
including 

G/Go 

Ability to read large letters 
(such as newspaper head- 
lines) 

Up to but not including 

Ability to read large print 
(larger than 14-point type) 

5 

“Borderline” 

cases 3 

20/200 20/200 6/00 

or more or more or more 

but not sufficient for use in an occupation 
or activity for which eyesight is essential. 

A. Ability to read 14-point 
type but not iopoint 
type. 

B. Ability to read io-point 
type but with a defect of 
vision (such as limited 
field, etc.) so great as to 
be a marked handicap. 


1 All measurements and tests apply to vision in the better eye after correction. 

2 “Light perception ” is defined to mean just sufficient vision to distinguish light from darkness. 

3 Examination by an eye physician is recommended for all cases but individuals in group 5 should not 
be finally classified except upon the basis of such an examination. Certain eye conditions such as high 
progressive myopia, greatly restricted field of vision, etc., may constitute such a severe handicap in ac- 
tivities for which eyesight is essential that even when the individual has a visual acuity of 20/200 or 
more, he is, for occupational purposes, blind. These are the “borderline” cases. 

This classification has been drawn up by the Committee on Central Statistics of the Blind. 

illustrated by the rough tests, and is sharply contrasted with the 
statistical conception of variability. Continued experimentation 
with rating scales should result in a gradual decrease in the use 
of the former and a gradual increase in the use of the latter. 

The Snellen scale was developed by recording the visual per- 
ception of patients at varying distances. Visual perception is defined 



STATISTICAL ANALYSIS 


411 

in terms of linear measurement. Other rating scales will be defined 
in other terms, but the aim is to find measures that can be applied 
with reliability and to express the results in quantitative form. 

3. chapin’s scale for rating living room equipment 

Sociologists have for a number of years been interested in de- 
veloping some measure of homes. The rural sociologists have 
made housing surveys for the purpose of determining that part of 
the farmer’s standard of living represented by the house he lives 
in. For the most part these surveys have not had the scientific 
value which it is desirable that they should have. The sociologist 
is interested in the home from the viewpoint of the standard of 
living of the family but perhaps more from the viewpoint of the 
home as a center of social interaction. Historically the concept of 
social interaction has been chiefly a descriptive term, but efforts 
are now being made to find some way of treating it as a variable 
in the statistical sense. In order to limit the size of his rating scale, 
Professor Chapin constructed a scale for the living room of the 
home only. He says: “The sociological assumptions underlying 
the Living Room Scale developed to measure socio-economic status 
as defined are: (1) the living room of a home is the room most 
likely to be the center of interaction of the family 3 ( 2 ) the living 
room equipment reflects the cultural acquisitions, the possessions, 
and the socio-economic status of the family.” 4 Here is an illustra- 
tion of the necessity of precision in knowing just what the pro- 
posed scale is to measure. Bedrooms, kitchen, dining room, 
basement, etc., are not considered. The study is restricted to the 
significance of the living room in the home. Such careful definition 
of purpose is a necessity for worth-while results. The Living 
Room Scale is reproduced below: 

Scale For Rating Living Room Equipment 

DIRECTIONS TO VISITOR 

I. The following list of items is for the guidance of the recorder. Not 
all of the features listed will be found in any one home. Entries 
on the schedules should, however, follow the order and numbering 
indicated. Weights appear after the names of the respective items. 
Disregard these weights in recording. Only when the list is finally 
checked should the individual items be multiplied by these weights 
4 Chapin, F. S., “Socio-Economic Status: Some Preliminary Results of Meas- 
urement,” Amcr. Jour. Soc., January, 1932, p. 581. 



412 


SOCIAL STATISTICS 


and the sum of the weighted scores be computed, and then only 
after leaving the home. All information is confidential. 

2. Check or underline the articles or items present. If more than one, 
write 2, 3, or 4, as the case may be. 

3. Do not enter the score of any article or feature present. Complete 
recording before attempting to enter scores. 

4. In cases where the family has no real living room, but uses the 
room at nights as a bedroom, or during the day as a kitchen or 
as a dining room, or as both, in addition to use of room as the 
chief gathering place of the family , please note this fact clearly 
and describe for what purposes the room is used. 

5. When possible it is desirable to have a living room checked twice. 
This may be done in either of two ways. 

a. After an interval of two or three weeks the same visitor may 
recheck the room. The first schedule should be marked I, the 
second II. 

b. After an interval or simultaneously the room may be checked 
by two different visitors. One schedule should be marked A, 
the other B. 

Scores of the same homes on two trials should be similar. If a group 
of homes are scored twice there should be a high correlation 
between the scores. Please report findings to F. Stuart Chapin, 
University of Minnesota. 

SCHEDULE OF LIVING ROOM EQUIPMENT 

10. Fire utensils 

Andirons, screen, poker, 
tongs, shovel, brush, hod, 
basket, rack. 1 each. 

11. Heat 

Stove 1, hot air 2, steam 3, 
hot water 4. 

12. Artificial light. 

Kerosene 1, gas 2, electric 3. 

13. Artificial ventilators 1 

1 4. Clothes closets 1 

Total Section I 

II. Standard Furniture 

15. Table 

Sewing 1, writing 1, card 1, 
library, end, tea, 2 each. 

16. Chair 

Straight, rocker, arm chair, 
high chair, 1 each. 

17. Stool or bench 

High stool, footstool, piano 
stool, piano bench, 1 each. 

18. Couch 

Cot l, sanitary couch 2, chaise 
longue 3, daybed 4, daven- 
port 5, bed-davenport 6. 


I. Fixed Features 

1. Floor - 

Softwood 1, hardwood 2, 
composition 3, stone 4. 

2. Floor covering 

Composition 1, carpet 2, 
small rugs 3, large rug 4, ori- 
ental rug 6. 

3. Wall covering 

Paper 1, kalsomine 2, plain 
paint 3, decorative paint 4, 
wooden panels 5. 

4. Woodwork 

Painted i ? varnished 2, 
stained 3, oiled 4. 

5. Door protection 

Screen 1, storm door 1. 

6. Windows 

1 each window; 

7. Window protection 1 . * 

Screen, blind, netting, storm 
sash, awning, shutter 1 each. 

8. Window covering 1 

Shades 1 , curtains 2, drapes 3. 

9. Fireplace 

Imitation 1, gas 2, wood 4, 
coal 4. 



STATISTICAL ANALYSIS 


413 


19. Desk 

Business 1, personal-social 2. 

20. Book case 1 

21. Wardrobe or movable cabi- 
net 1 

22. Sewing cabinet 1 

23. Sewing machine 

Hand power 1, foot power 2, 
electric 3. 

24. Rack or stand 1 

25. Screen 1 

26. Chests 1 

27. Music cabinet 1 _ 

Total Section II ... 

III. Furnishings and Cultural Resources 

28. Covers 

Furniture, table, chair, couch, 
piano, I each. 

29. Pillows 

Couch, floor, 1 each. 

30. Lamps 

Floor, bridge, table, 1 each. 

31 Candle holders, 1 each 

32. Clock 

Mantel, grandfather, wall, 
alarm, I each. 

33 Mirror, 1 each 

34. Pottery, brass or metal 

Factory made 1, hand made 

2 each. 

35. Baskets — 

F actory or h and m ade, waste, 

se wi ng, sand wich, decora ti ve, 

1 each. 

36. Statues 1 each 

37. Vases I, flowers or- plants, 2 

each 

38. Photographs 1 each (por- 
traits of personal interest) 

39. Pictures 

Note if original or reproduc- 
tion. If original, oil, water 
color, etching, wood block, 
lithograph, crayon drawing, 

encil drawing, pen and ink, 
rush drawing, photograph 
(when treated as a work of 
art), 2 each; if reproduction, 
photograph, half tone, color 
print, chromo, 1 each. 

40. Books 1 2 

Poetry, fiction, history, 
drama, biography, philoso- 
phy, essays, literature, reli- 
gion, art, science (physical, 


psychological, social), atlas, 
dictionary, encyclopedia, .20 
for each volume. 

41. Newspapers 3 

General, labor, local commu- 
nity, sectarian, 1 for each 
type of paper. 

42. Periodicals 3 

News (current events), pro- 
fessional, religious, literary, 
science, art, children’s, i 
each; fraternal, fashion, or 
popular story, .50 each. 

43. Telephone 3 

Switchboard connection 1, 
two-party line 2, one-party 
line 3 (Note social or business 
mainly.) 

44. Radio 3 

Crystal 1, one-tube 2, two- 
tube 3, three-tube 4, five-tube 
and up, 5. 

45. Musical instruments 3 

Piano 5, organ 1, violin 1, 
other hand instruments, I 
each. 

46. Mechanical musical instru- 
ments 3 

Music box 1, phonograph 2, 
player-organ 3, player-piano 
4 • 

47. Sheet music 3 _ 

Opera, folk,military, ballads, 
classic, dance (other than 
jazz), children’s exercises, .05 
for each sheet; jazz, .01 for 
each sheet. 

48. Phonograph records 3 — 

Type of music(as above); type 
of instrument reproduced; 
voice — solo, duet, quartet, 
chorus; instrumental — solo, 
instrument (piano, violin, 
etc.), trio, quartet, band, or- 
chestra, .10 for each record; 
jazz, .01 for each. 

Total Section III 

IV. Atmosphere and " Gestalt ” of Room 

49. Cleanliness of room and fur- 
nishings 

a. Spotted or stained (—4) 

b. Dusty (—2) 

c. Spotless and dustless 


1 If checked out of season, ascertain if used in season and so record. 

2 To be recorded if in another room (except professional library of doctor, 
lawyer, clergyman). 

8 To be recorded if in another room. 



4 H 


SOCIAL STATISTICS' 


52. Record your general impres- 
sion of good taste 

a. Bizarre, clashing, in- 

harmonious or offen- 
sive (—4) 

b. Drab, monotonous, 

neutral, inoffensive 
(-2) 

c. Attractive in a positive 

way, harmonious, 
quiet anil restful (+2) 


Total Section IV 

Sums of Weighted Scores 

Total Section I 

Section II _ 

Section III _ 

Section IV 

Grand Total 

How Is the Living Room Scale Related to Other Criteria 

This scale makes it possible to measure home environment in terms 
of socio-economic status. The original study upon which the scale is 
based, defined socio-economic status as the position that an indi- 
vidual or a family occupies with reference to the prevailing average 
standards of cultural possessions, effective income, material posses- 
sions, and participation in group activity of the community. Effective 
income was measured by the Svdcnstrickcr-King scale; cultural pos- 
sessions, material possessions, and participation in group activity of 
the community were each measured by separately devised scales. 
The living room scale was then constructed as a simple measure 
which showed high correlation with the composite scores of the 
original four measures of socio-economic status. 

VALIDITY 

(1) 38 homes with Chapman-Sims scale, r — +.69 db .08 

(2) 18 homes with Holley, p — 4*. 5 14 

(3) 29 Minnesota Children’s Bureau cases with social worker’s judgments, bi- 

scrial r = +.90 

(4) 75 homes in New York with 60 environmental factors (Van Alstyne, p. 59) 

r = +.68 dtz .04. 


RANGES OE SCALE 


Tester 

Place 

Number 

Range 

Mean 

Social Class 

Chapin 

. . Minneapolis 

-- 38 

20- 89 

50 

Middle class 4 

Taeuber 

. . Minneapolis 

46 

60-359 

163 

Upper middle 4 

Chapin 

. . Twin Ciyes 

29 

25-108 

62 

Middle class 4 

Van Alstyne. . 

. . New York 

75 

20-200 

76 

Middle class 5 

Conklin 

. . Brooklyn, N. Y.. . 

. . 128 

44 ~ 3 8 4 

111 

Upper middle 5 


CORRELATION WITH OTHER FACTORS 
Factor Correlation No. Cases Investigator 

Education of parents r — +.71 120 Skalct 

Occupational status (Minnesota Occu- 
pational Classification) r = +.74 120 Skalet 


50. Orderliness of room and fur- 
nishings 

a. Articles strewn about 

in disorder(— 2)_ 

b. Articles in place or in 

usable order (+2) 

51. Condition of repair of arti- 
cles and furnishings 

a. Broken, scratched, 

frayed, ripped, or 
torn (—4) 

b. Articles or furnishings 

patched up (-2) 


c. Articles or furnish- 
ings in good repair 
and well kept (+2) 



STATISTICAL ANALYSIS 


Factor 

I. Q. of child 0 

child’s m.a. 6 7 ;;;;; 

Mother’s intelligence 

Child’s vocabulary 7 

4 Detached and duplex houses. 

6 Flats and apartment houses. 
* Four-year-olds. 

7 Three-year-olds. 


Correlation 
r = +.46 
r = +-59 
r = +.65 
r = +.67 


No. Cases 
70 
75 
75 
75 


415 

Investigator 

Skalet 

Van Alstyne 
Van Alstyne 
Van Alstyne 


The Scale is divided into four sections, and each item has a 
weight assigned to it. The sums of the weighted scores for each 
section, then, give the relative importance of a particular home. 
The grand total weighted score gives a basis of comparing one 
living room with another and of arranging a large number of such 
measures in the form of a frequency distribution to compare one 
community with another. The Scale has been used to measure the 
socio-economic status of over six hundred homes, and a project is 
now under way to standardize it. r> 


4. WOODWORTH -MATT 1 1 EWS PSYCIIONEUROTIC INVENTORY 

In 1918, Professor R. S. Woodworth developed the Psycho- 
neurotic Inventory for the purpose of detecting psychopathic and 
neurotic tendencies among the soldiers of the American army. The 
original Inventory contained it 6 questions, or statements. Dr. 
Ellen Matthews eliminated 46 of the statements to adapt it to use 
with school children. That left 70 statements, the form in which 
she used it for investigations among normal school children. 0 In 
his study of delinquent boys in four institutions in New York 
State, Dr. John Slawson used the Inventory to determine the 
psycho-neurotic status of delinquent boys as compared with that of 
non-delinquents. The Inventory is reproduced below: 

Matthews Revision of the Psyciioneurotic Inventory 
1. Do you like to play by yourself better than to play 


with other boys ? Yes No 

2. Do other boys let you play with them? Yes No 

3. Did you ever run away from home? Yes No 

c Chapin, op . cit., pp. 5S1, 586, 587. 


6 Matthews, Ellen, “A Study of Emotional Stability in Children,” Jour . 
Deling., 1921, No. 8, pp. 1-40. 

The Inventory, as given below, has been taken from Slawson, John, The De- 
linquent Boy, pp. 218-221. Boston: Richard G. Badger, 1926. 



4 1 6 SOCIAL STATISTICS 

1 

4. Did you ever want to run away from home? Yes No 

5. Do people find fault with you much? Yes No 

6. Do you think people like you as much as they do 

other people? Yes No 

7. Does it make you uneasy to cross a bridge over water? Yes No 

8. Do you mind going into a tunnel or subway? Yes No 

9. Are you afraid of water? Yes No 

10. Are you afraid during a thunder storm? Yes No 

11. Do you feel like jumping off when you are on a high 

place? Yes No 

12. Are you afraid of the dark? Yes No 

13. Are you often frightened in the middle of the night?. . Yes No 

14. Do you have a light in your room at night? Yes No 

15. Do you ever cry out in your sleep? Yes No 

16. Do you talk in your sleep? Yes No 

17. Do you walk in your sleep? Yes No 

18. Are you troubled with dreams about your play? Yes No 

19. Do you ever have the same dream over and over?. . . . Yes No 

20. Do you ever cry yourself to sleep? Yes No 

21. Did you ever have the habit of picking your toes or 

your nose? Yes No 

22. Did you ever have the habit of stuttering? Yes No 

23. Can you sit still without fidgeting? Yes No 

24. Did you ever have the habit of twitching your head, 

neck or shoulders? Yes No 

25. Do you break and tear and spoil things more than 

other people? Yes No 

26. Do you ever get so angry that you see red? Yes No 

27. Do you stumble and fall over things more than other 

people? Yes No 

28. Are you usually happy? Yes No 

29. Do you ever feel that nobody loves you? Yes No 

30. Do you ever wish you had never been born? Yes No 

31. Do you ever wish you were dead? Yes No 

32. Do you ever giggle over nothing at all? Yes No 

33. Is it easy to get you cross over very small things?. . . . Yes No 

34. Did you ever have a real fight? Yes No 

35. Do you like to tease people till they cry? Yes No 

36. Can you stand pain as quietly as others do? Yes No 

37. Do you ever feel a certain pleasure in hurting a person 

or an animal? Yes No 

38. Do you feel that you ‘are a little bit different from 

other people? Yes No 



STATISTICAL ANALYSIS 417 

39. Do you seem to have a harder time to get along in 

school than other boys do? Yes No 

40. Do you ever feel that your parents are not really 

your own? Yes No 

41. Do you ever have the feeling as if you were falling 

just before going to sleep? Yes No 

42. Do you ever feel as if you were smothering? Yes No 

43. Are you usually on time? ; Yes No 

44. Do you usually feel well and strong? Yes No 

45. Do you usually sleep well? Yes No 

46. Do you feel well rested in the morning?* Yes No 

47. Do you feel sort of tired a good deal of the time?. . . . Yes No 

48. Do you feel bored a good deal of the time? Yes No 

49. Do your eyes often pain you? Yes No 

50. Do you have many bad headaches? Yes No 

51. Have you ever fainted away? Yes No 

52. Does your family treat you right? Yes No 

53. Do your teachers generally treat you right? Yes No 

54. Are you ever bothered by a feeling that things are not 

real? Yes No 

55. Are you ever troubled with the idea that somebody 

is following you? Yes No 

q6. Do you ever feel that someone is trying to do you 

harm? Yes No 

57. Does it make you uneasy to cross a wide street or 

open square? Yes No 

58. Does it make you uneasy to sit in a small room with 

the door shut? Yes No 

59. Do you usually know just what you want to do next? Yes No 

60. Do you have a hard time making up your mind about 

things? Yes No 

61. Do you have a great fear of fire? Yes No 

62. Do you ever have a strong desire to set fire to some- 

thing? Yes No 

63. Did you ever have a strong desire to steal things? Yes No 

64. Do you think you have more fears than most people?. . Yes No 

65. Do you make friends easily? Yes No 

66. Do you get tired of people easily? Yes No 

67. Have you any very strong superstitions? Yes No 

68. Did you ever have a vision? Yes No 

69. Did you ever feel that you were very wicked? Yes No 

70. Do you consider yourself a very moody person? Yes No 



418 


SOCIAL STATISTICS 


The strong points in favor of the Inventory are the simplicity 
of the questions asked, the fact that the answers are either “yes” 
or “no,” the fact that the answers can be treated quantitatively by 
regarding the verbal responses as symptomatic of mental states, 
and the further fact that the scoring is not dependent upon the 
opinions of the scorer. While no claim is made that the Inventory 
is better than a psychiatric examination or that it should supersede 
such an examination, it has an advantage in that it provides a basis 
for quantitative comparison of the reactions of a special group, 
such as delinquent boys, with the reactions of an unselected group 
of non-delinquent children. The scores can be arranged in a fre- 
quency distribution and compared either graphically or in terms 
of averages and dispersions. However, the Inventory is open to 
all the criticisms to which any questionnaire is liable. There is no 
way of checking the veracity of the answers, and there is no way 
of knowing whether boys of all grades of intellectual ability un- 
derstand the questions alike. The reliability of the results of the 
Inventory depends upon the veracity of the answers and the ques- 
tion of uniformity of understanding . 7 

Every such scoring device as the Inventory requires “standard- 
ization.” The technique of standardizing a scale involves two 
procedures. First, the scale should be applied by different ob- 
servers to the same subject or subjects. How closely do the ratings 
of the different observers agree? The degree of correlation be- 
tween the results is a measure of the reliability, or internal con- 
sistency, of the rating scale. That is, if the coefficient of correlation 
is high, it indicates that different observers can apply the scale 
in the same way and obtain similar results. Second, other scales 
should be applied to the same subject or subjects. How closely do 
the ratings agree? The degree of correlation between results ob- 
tained on each of the rating scales and results obtained on the 
scale to be standardized is a measure of validity, or external con- 
sistency, of the rating scales. If the correlation is low, the question 
arises as to which scale is better. Obviously they do not measure 
the same thing, or, ifcthey do, they do not measure it in the same 
way. Some of the other tests suggested by Gould may then be 
applied to the scale . 8 

7 Slawson has discussed the .strength and the weakness of the Inventory and 
concludes that his results are reasonably reliable. See op. cit., pp. 221-223. 

8 See pp. 494, 495 above. 



STATISTICAL ANALYSIS 


419 


It is important at an early stage in the use of a scale to deter- 
mine the form of distribution of the trait in question. Is it dis- 
tributed according to the normal curve of error, or is it distributed 
in the form of a skewed curve? That is, a norm must be estab- 
lished with which to compare other sample studies. This could 
be accomplished by taking a random sample, or unselected group 
of individuals, of sufficient size and applying the scale to them. It 
is desirable to take several random samples as a check on the 
validity of the guess that any particular sample is random. If the 
results in each case are similar, it may be assumed that the scale 
has been applied to similar samples and that the form of the dis- 
tribution of the trait is a satisfactory norm. This is on the assump- 
tion, of course, that the scale has been standardized for reliability 
and validity. 

Attention should be called to the fact that each question in the 
Inventory is given equal weight. This raises a different problem 
regarding rating scales: that of weighting the questions or state- 
ments. The justification for giving equal weight to all statements 
may be questioned. For example, questions 9 and 10 in the In- 
ventory are similar, but they involve stimuli which are different 
qualitatively and quantitatively: “Are you afraid of water?” and 
“Are you afraid during a thunder storm?” Do they equally reflect 
the mental stability of the individual? How could such a question 
be answered satisfactorily? Should the first be given a weight of 
two and the second a weight of one, or vice versa? Of course, the 
assumption regarding the differential importance of the questions 
is that with a large number of questions, many of which are 
similar, the necessity of a weighting scheme is eliminated. But that 
is an open question which the maker of rating scales should always 
take into account. 

5. MEASUREMENT OF ATTITUDE TOWARD THE CHURCH 

The scale worked out by Professors I,. L. Thurstone and E. J. 
Chave for measuring attitudes toward the church provides a good 
example of the method of constructing rating scales for attitudes 
and of methods of standardization. In planning this scale the first 
problem was to determine what opinions about the church actually 
exist. “Several groups of people and many individuals,” the au- 
thors explain, “were asked to write out their opinions about the 
church, and current literature was searched for suitable brief state- 
ments that might serve the purposes of the scale. By editing such 



420 


SOCIAL STATISTICS 


material a list of 130 statements was prepared, expressive of atti- 
tudes covering as far as possible all gradations from one end of 
the scale to the other.” 9 Careful attention was given to selecting 
a list of opinions ranging all the way from complete confidence 
to complete antagonism. In the middle of the range would be 
found more or less neutral statements of opinion. Attention to the 
neutral opinions was of fundamental importance to prevent the 
scale from breaking into two parts and the scores being distributed 
in a U-shaped curve instead of in the form of a normal distribution. 

Certain practical criteria were applied to the first editing of the 
work. The most important were as follows: “(1) The statements 
should be as brief as possible so as not to fatigue the subjects who 
are asked to read the whole list. (2) The statements should be 
such that they can be indorsed or rejected in accordance with their 
agreement or disagreement with the attitude of the reader. Some 
statements in a random sample will be so phrased that the reader 
can express no definite indorsement or rejection of them. (3) 
Every statement should be such that acceptance or rejection of the 
statement does indicate something regarding the reader’s attitude 
about the issue in question. If, for example, the statement is made 
that war is an incentive to inventive genius, the acceptance or 
rejection of it really does not say anything regarding the reader’s 
pacifistic or militaristic tendencies. He may regard the statement 
as an unquestioned fact and simply indorse it as a fact, in which 
case his answer has not revealed anything concerning his own 
attitude on the issue in question. However, only the conspicuous 
examples of this effect should be eliminated by inspection, because 
an objective criterion is available for detecting such statements 
so that their elimination from the scale will be automatic. Personal 
judgment should be minimized as far as possible in this type of 
work. (4) Double-barreled statements should be avoided except 
possibly as examples of neutrality when better neutral statements 
do not seem to be readily available. Double-barreled statements 
tend to have a high ambiguity. (5) One must insure that at least 
a fair majority of the statements really belong on the attitude 
variable that is to be measured. If a small number of irrelevant 
statements should be either intentionally or unintentionally left in 
the series, they will be automatically eliminated by an objective 
criterion, but the criterion will not be successful unless the ma- 

0 Op. cit p. 22. 



STATISTICAL ANALYSIS 


421 

jority of the statements are clearly a part of the stipulated 
variable.” 10 

A list of 130 was taken from the statements obtained from 
individuals and from literature. In order to arrive at an approxi- 
mate gradation of the statements ranging from highest apprecia- 
tion to highest depreciation of the church, 341 individuals were 
asked to arrange the statements in eleven groups, beginning with 
highest appreciation and ‘ending with highest depreciation. The 
130 statements were mimeographed on small slips of paper, and 
each subject was given 1 1 master-slips lettered A to K. F fell in 
the middle, and to this master-slip was to be assigned all the 
statements regarded as neutral. Only the first, middle and last 
piles were given descriptions; within this range the subjects were 
to classify the opinions. The authors worked out scale values for 
each statement from this sorting. A few of the statements are 
given to illustrate the types used: 11 

1. I have seen no value in the church. 

2. I believe the modern church has plenty of satisfying interests 

for young people. 

3. I do not hear discussions in the church that are scientific or 

practical and so I do not care to go. 

4. I believe that membership in a good church increases one’s self- 

respect and usefulness. 

5. I believe a few churches are trying to keep up to date in their 

thinking and methods of work, but most are far behind the 
times. 

6. I regard the church as an ethical society promoting the best 

way of living for both an individual and for society. 

It will be noted that Thurstone and Chave used statements which 
were to be marked “yes” or “no,” while Woodworth-Matthews 
used questions. There may be a question as to which method is 
better, but extensive experimentation would be required to decide 
this. Furthermore, the two tests are seeking different things. 
Woodworth and Matthews are asking for a report of experience 
as a matter of fact, while Thurstone and Chave are asking for an 
expression of opinion. Thurstone and Chave regard opinions as 
symbolic of attitudes, and their study of attitudes is based upon 
the theory that an attitude is correctly represented by verbal opin- 

10 op. cit., pp. 22, 23. 

11 Op. cit.. Chap. II. 



422 


SOCIAL STATISTICS 


ions. This assumption might also be questioned, and the authors 
recognize that fact. 

A final list of 45 statements was selected from the 130 opinions, 
after the criteria of ambiguity and irrelevance had been applied 
and after consideration of the scale values and careful inspection 
of the statements themselves. From this final study an “experi- 
mental attitude scale” was developed. The authors summarize 
their judgment regarding the scale thus: “The essential character- 
istic of the present measurement method is the scale of evenly 
graduated opinions so arranged that equal steps or intervals on 
the scale seem to most people to represent equally noticeable shifts 
in attitude.” 12 That is, a means has been devised for treating 
attitudes as continuous variables. This is an important step in the 
quantitative treatment of facts traditionally regarded as qualitative 
and subjective. It shows that any dogmatic skepticism about meas- 
urements in psychology and the social sciences is of doubtful 
validity and, as experimentation proceeds, may be proved largely 
unwarranted. 


6. EXERCISES 

1. Devise the following types of rating scales with the individual 
items appropriately weighted: 

(a) A scale for rating student room equipment. 

(b) A scale for rating student attitudes toward military train- 
ing in colleges. 

2. Obtain from the University of Minnesota a supply of Chapin’s 
Scale for Rating Living Room Equipment and make a survey 
of 100 or more living rooms in your college town. If each 
student does a certain number of these, the field work will not 
be laborious. Then the data on all schedules can be combined 
for analysis and comparison of homes. Compare your results 
with Chapin’s. 

3. Obtain from the University of Chicago Press a supply of 
Thurstone and Chave’s scale for measuring attitudes toward 
the church and get them filled out by 100 or more students. 
If each student m the class takes his pro rata of the forms, the 
time required for obtaining the original data will not be great. 
Returns may be pooled for analysis by each student. Compare 
your results with Thurstone and Chave’s. 

12 Op. cit., p. 82. 



STATISTICAL ANALYSIS 


423 


7. REFERENCES 

Chapin, F. S., “A Quantitative Scale for Rating the Home and 
Social Environment of Middle Class Families in an Urban 
Community: A First Approximation to the Measurement of 
Socio-Economic Status,” Jour. Educ. Psych. , No. 2, pp. 99- 

iii. 

: — “Socio-Economic Status: Some Preliminary Results of 

Measurement,” Amer. Jour . Soc. y Vol. XXXVII, No. 4, pp. 

5 8 i - 5 8 7 - 

“The Meaning of Measurement in Sociology,” Pub. 0] the 

Amer. Soc. Soc. y Vol. XXIV, pp. 83-94. 

Hartshorne, Hugh, and May, Mark A., Studies in Deceit , Chaps. 
Ill, IV, Vlll, IX. 

Lundberg, George A., Social Research , Chaps. IX and X. 
McCormick, Mary J., The Measurement of Home Conditions , a 
pamphlet published by the National Catholic School of Social 
Service, Washington. 

Slawson, John, The Delinquent Boy , Chap. IV. 

Thurstone, L. L., and Chave, E. J., The Measurement of 
Attitude. 

Watson, G. B., The Measurement of Fair-Mindedness. 




APPENDICES 




APPENDIX A 


TABLE CXXI 1 

Ordinates of the normal probability curve expressed as fractional parts 
of the mean ordinate y 0 . Each ordinate is erected at a given distance from 
the mean. The height of the ordinate erected at the mean can be com- 
puted from, A t N 

y<> = - — — = 

a V 2 7r 2.50(S(i <r 

The corresponding height of any other ordinate can be read from the table 
by assigning the distance that the ordinate is from the mean (a*). Distances 
on x are measured as fractional parts of <r . Thus the height of an ordinate 
at a distance from the mean of .7 a will be .78270 y a ; the height of an or- 
dinate at 2.15 <r from the mean will be .09014 y of etc. 


X/<T 

0 

1 

2 

3 

4 

5 

0 

7 

8 

9 

0.0 

100000 

09995 

99980 

99955 

99920 

99875 

99820 

99755 

99085 

99590 

0.1 

99501 

99390 

99283 

99158 

99025 

98881 I 

98728 

98505 

98393 

98211 

0.2 

98020 

97819 

97609 

97390 

97101 

90923 ! 

90070 

90420 

90150 

95882 

0.3 

95000 

95309 

95010 

94702 

94387 

94055 1 

93723 

93382 

93024 

92077 

0 4 

92312 

91399 

91558 

91169 

90774 

90871 

89901 

89543 

89119 

88088 

0.5 

88250 

87805 

87353 

80890 

80432 

85902 

85488 

85000 

84519 

84000 

0.0 

83527 

83023 

82514 

82010 

81481 

80957 

80429 

79890 

79359 

78817 

0.7 

78270 

77721 

77107 

70010 

70048 

75484 

74910 

74342 

73709 

73193 

0.8 

72015 

72033 

71448 

70801 

70272 

09081 

09087 

08493 

07890 

07298 

0.9 

00089 

60097 

05494 

04891 

04287 

03083 

03077 

02472 

01805 

01259 

1.0 

00053 

60047 

59440 

58834 

58228 

57023 

57017 

50414 

55810 

55209 

1.1 

54007 

54007 

53409 

52812 

52214 

51020 

51027 

50437 

49848 

49200 

1.2 

48075 

48092 

-47511 

40933 

40357 

45783 

45212 

44044 

44078 

43510 

1.3 

42950 

42399 

41845 

41294 

40747 

40202 

39061 

39123 

38509 

38058 

1.4 

37531 

37007 

36487 

35971 

35459 

34950 

34445 

33944 

33447 

32954 

1.5 

32405 

31980 

31500 

31023 

30550 

30082 

29018 

29158 

28702 

28251 

1.6 

27804 

27301 

20923 

20489 

20059 

25034 

25213 

24797 

24385 

23978 

1.7 

23575 

23170 

22782 

22392 

22008 

21027 

21251 

20879 

20511 

20148 

1.8 

19790 

19430 

19080 

18741 

18400 

18004 

17732 

17404 

17081 

10702 

1.9 

10448 

10137 

15831 

15530 

15232 

14939 

14050 

14304 

14083 

1 3806 

2.0 

13534 

13205 

13000 

12740 

12483 

12230 

11981 

11737 

11490 

11259 

2.1 

11025 

10795 

10570 

10347 

10129 

09914 

09702 

09495 

09290 

09090 

2.2 

08892 

08098 

08507 

08320 

08130 

07956 

07778 

07004 

07433 

07265 

2.3 

07100 

06939 

00780 

00024 

00471 

00321 

00174 

00029 

05888 

05750 

2.4 

05014 

05481 

05350 

05222 

05090 

04973 

04852 

04734 

04018 

04505 

2.5 

04394 

04285 

04179 

04074 

03972 

03873 

03775 

03680 

03580 

03494 

2.6 

03405 

03317 

03232 

03148 

03000 

02980 

02908 

02831 

02757 

02084 

2.7 

02012 

02542 

02474 

02408 

02343 

02280 

02218 

02157 

02098 

02040 

2.8 

01984 

01929 

01876 

01823 

01772 

01723 

01074 

01027 

01581 

01530 

2.9 

01492 

01449 

01408 

01307 

01328 

01288 

01252 

01215 

01 179 

01145 

3.0 

01111 

00819 

00598 

00432 

00309 

00219 

00153 

00100 

00073 

00050 

4.0 

5.0 

00034 

00000 

00022 

00015 

00010 

00006 

00004 

00003 

00002 

00001 

00001 


1 Rugg, H. O., Statistical Methods Applied to Education , p. 388 . Boston: 
Houghton Mifflin Co., 1917 . Reprinted by permission of the publishers. 

427 




428 SOCIAL STATISTICS 

TABLE CXXII 1 

Fractional parts of the total area ( 10 , 000 ) under the normal probability 
curve, corresponding to distances on the baseline between the mean and 
successive points of division laid off from the mean. Distances are meas- 
ured in units of the standard deviation, <r. To illustrate, the table is read 
as follows: between the mean ordinate, y 0> and any ordinate erected at a 

distance from it of, say, . 8 <r , is included 28.81 per cent of 

the entire area. 


x/<J 

.00 

.01 

.02 

.03 

0.0 

0000 

0040 

0080 

0120 

0.1 

0398 

0438 

0478 

0517 

0.2 

0793 

0832 

0871 

0910 

0.3 

1179 

1217 

1255 

1293 

0.4 

1554 

1591 

1628 

1664 

0.5 

1915 

1950 

1985 

2019 

0.6 

2257 

2291 

2324 

2357 

0.7 

2580 

2612 

2642 

2673 

0.8 

2881 

2910 

2939 

2967 

0.9 

3159 

3186 

3212 

3238 

1.0 

3413 

3438 

3461 

3485 

1.1 

3643 

3665 

3686 

3718 

1.2 

3849 

3869 

3888 

3907 

1.3 

4032 

4049 

4066 

4083 

1.4 

4192 

4207 

4222 

4236 

1.5 

4332 

4345 

4357 

4370 

1.6 

4452 

4463 

4474 

4485 

1.7 

4554 

4564 

4573 

4582 

1.8 

4641 

4649 

4656 

4664 

1.9 

4713 

4719 

4726 

4832 

2.0 

4773 

4778 

4783 

4788 

2.1 

4821 

4826 

4830 

4834 

2.2 

4861 

4865 

4868 

4871 

2.3 

4893 

4896 

4898 

4901 

2.4 

4918 

4920 

4922 

4925 

2.5 

4938 

4940 

4941 

4943 

2.6 

4953 

4955 

4956 

4957 

2.7 

; 4965 

4966 

4967 

4968 

2.8 

, 4974 

| 4975 

4976 

j 4977 

2.9 

4981 

4982 

4983 

4984 


.04 . 05 . 06 . 07 . 08 . 09 


0159 0199 0239 0279 0819 0859 
0557 0596 0636 0675 0714 0753 
0948 0987 1026 1064 1103 1141 
1331 1368 1406 1443 1480 1517 
1700 1736 1772 1808 1844 1879 

2054 2088 2123 2157 2190 2224 
2389 2422 2454 2486 2518 2549 
2704 2734 2764 2794 2823 2852 
2995 3023 3051 3078 8106 8133 
3264 3289 3315 3340 3365 3389 

3508 3531 3554 3577 3599 3621 
3729 3749 3770 3790 3810 3830 
3925 3944 3962 3980 3997 4015 
4099 4115 4131 4147 4162 4177 
4251 4265 4279 4292 4306 4319 

4382 4394 4406 4418 4430 4441 
4495 4505 4515 4525 4535 4545 
4591 4599 4608 4616 4625 4633 
4671 4678 4686 4693 4699 4706 
4738 4744 4750 4758 4762 4767 

4793 4798 4803 4808 4812 4817 
4838 4842 4846 4850 4854 4857 
4875 4878 4881 4884 4887 4890 
4904 4906 4909 4911 4913 4916 
4927 4929 4931 4932 4934 4936 

4945 4946 4948 4949 4951 4952 
4959 4960 4961 4962 4963 4964 
4969 4970 4971 4972 4973 4974 
4977 4078 4879 4980 4980 4981 
4984 4984 4985 4985 4986 4986 


Rugg, H. O., op. cit. y p. 389 . 



CHI-FUNCTION FOR PEARSON CHI TEST 429 
TABLE CXXIII 

Tables of the Chi-Function for the Pearson Chi Test 1 


X* 

n = 3 

n = 4 

n = 5 

n = 6 

1 

.60653 06597 

1 

.80125 195(69) 

.90979 598(96) 

.96256 577(32) 

2 

.36787 94412 

.57240 670(44) 

.73575 888(23) 

.84914 503(60) 

3 

.22313 01601 

.39162 517(63) 

.55782 540(04) 

.69998 583(59) 

4 

.13533 52832 

.26146 412(99) 

.40600 584(97) 

.54941 595(12) 

5 

.08208 49986 

.17179 714(43) 

.28729 749(52) 

.41588 018(72) 

6 

.04978 70684 

.11161 022(51) 

.19914 827(35) 

.30621 891(86) 

7 

.03019 73834 

.07189 777(25) 

.13588 822(54) 

.22064 030(80) 

8 

.01831 56389 

.04601 170(57) 

.09157 819(44) 

.15623 562(76) 

9 

.OHIO 89965 

.02929 088(65) 

.06109 948(10) 

. 10906 415(79) 

10 

.00673 79470 

.01856 612(57) 

.04042 768(20) 

.07523 523(64) 

11 

.00408 67714 

.01172 587(55) 

.02656 401(44) 

.05137 998(34) 

12 

.00247 87522 

.00738 316(05) 

.01735 126(52) 

.03478 778(05) 

13 

.00150 34392 

.00163 660(55) 

.01127 579(39) 

.02337 876(81) 

14 

.00091 18820 

.00290 515(28) 

.00729 505(57) 

.01560 941(61) 

15 

.00055 30844 

.00181 664(90) 

.00470 121(71) 

.01036 233(79) 

16 

.00033 54626 

.00113 398(42) 

.00301 916(37) 

.00684 407(35) 

17 

.00020 34684 

.00070 674(24) 

.00193 294(95) 

.00449 979(70) 

18 

.00012 34098 

.00043 984(97) 

.00123 409(80) 

.00294 640(46) 

19 

.00007 48518 

.00027 339(89) 

.00078 594(42) 

.00192 213(68) 

20 

.00004 53999 

.00016 974(16) 

.00049 939(92) 

.00124 972(97) 

21 

.00002 75364 

.00010 527(62) 

.00031 666(92) 

.00081 005(96) 

22 

.00001 67017 

.00006 523(11) 

.00020 042(04) 

.00052 359(83) 

23 

.00001 01301 

.00004 038(30) 

.00012 662(62) 

.00033 756(61) 

24 

.00000 61442 

.00002 498(00) 

.00007 987(48) 

.00021 711(29) 

25 

.00000 37267 

.00001 544(05) 

.00005 030(98) 

.00013 933(73) 

26 

.00000 22603 

.00000 953(74) 

.00003 164(46) 

.00008 923(60) 

27 

.00000 13710 

.00000 600(96) 

.00001 987(89) 

.00005 716(47) 

28 

00000 08315 

.00000 361(89) 

.00001 247(29) 

.00003 638(57) 

29 

.00000 05043 

.00000 223(94) 

.00000 781(74) 

.00002 318(76) 

30 

.00000 03059 

.00000 137(09) 

.00000 489(44) 

.00001 473(95) 

40 

00000 00021 

.00000 001(07) 

.00000 004(12) 

.00000 014(93) 

50 

00000 00000 

.00000 000(00) 

.00000 000(03) 

.00000 000(13) 

60 

00000 00000 

.00000 000(00) 

.00000 000(00) 

.00000 000(00) 

70 

.00000 00000 

.00000 000(00) 

.00000 000(00) 

.00000 000(00) 

1 ■ 1 


1 Computed by Miss Anna M. Lescisin, Indiana University, to 10 decimal places. 
The last two places in parentheses indicate some lack of confidence in these figures. 
The following errors are to be noted: 


Pearson's Value 
X 2 n = 12 

7 .799073 

12 .362642 


Our Value 
X* 12 

7 .799083 

12 .363643 



43 ° 


SOCIAL STATISTICS 


X 2 

ft = 7 

ft = 8 

ft = 9 

ft = 10 

1 

.98561 282 ( 20 ) 

.99482 853 ( 65 )- 

.99824 837 ( 74 ) 

.99943 750 ( 26 ) 

2 

.91909 800 ( 29 ) 

.95984 036 ( 87 ) 

.98101 184 ( 31 ) 

.99116 760 ( 65 ) 

3 

.80884 688 ( 05 ) 

.83500 223 ( 17 ) 

.93435 754 ( 56 ) 

.96429 497 ( 27 ) 

4 

.67667 641 ( 62 ) 

.77977 740 ( 84 ) 

.85712 346 ( 05 ) 

.91141 252 ( 67 ) 

5 

.54381 311 ( 59 ) 

.65996 323 ( 00 ) 

.75757 613 ( 31 ) 

.83430 826 ( 07 ) 

6 

.42819 008 ( 11 ) 

.53974 935 ( 08 ) 

.64723 188 ( 88 ) 

.73991 829 ( 27 ) 

7 

.32084 719 ( 89 ) 

.42887 985 ( 77 ) 

.53663 266 ( 80 ) 

.63711 940 ( 74 ) 

8 

.23810 330 ( 56 ) 

.33259 390 ( 26 ) 

.43347 012 ( 03 ) 

.53414 621 ( 68 ) 

9 

.17357 807 ( 09 ) 

.25265 604 ( 65 ) 

.34229 595 ( 58 ) 

.43727 418 ( 87 ) 

10 

.12465 201 ( 95 ) 

.18857 345 ( 78 ) 

.26502 591 ( 53 ) 

.35048 520 ( 26 ) 

11 

.08837 643 ( 24 ) 

.13861 902 ( 08 ) 

.20169 919 ( 87 ) 

.27570 893 ( 67 ) 

12 

.06196 880 ( 44 ) 

.10055 886 ( 85 ) 

.15120 388 ( 28 ) 

.21330 930 ( 51 ) 

13 

.04303 594 ( 69 ) 

.07210 839 ( 10 ) 

.11184 961 ( 16 ) 

.16260 626 ( 22 ) 

14 

.02963 616 ( 39 ) 

.05118 135 ( 34 ) 

.08176 541 ( 63 ) 

.12232 522 ( 80 ) 

15 

.02025 671 ( 51 ) 

.03599 940 ( 48 ) 

.05914 545 ( 98 ) 

.09093 597 ( 66 ) 

16 

.01375 396 ( 77 ) 

.02511 635 ( 89 ) 

.04238 011 ( 41 ) 

.06688 158 ( 26 ) 

17 

.00928 324 ( 43 ) 

.01739 618 ( 25 ) 

.03010 907 ( 97 ) 

.04871 597 ( 63 ) 

18 

.00623 219 ( 51 ) 

.01197 000 ( 23 ) 

.02122 648 ( 63 ) 

.03517 353 ( 94 ) 

19 

.00416 363 ( 30 ) 

.00838 734 ( 10 ) 

.01485 964 ( 77 ) 

.02519 289 ( 50 ) 

20 

.00276 939 ( 57 ) 

.00556 968 ( 23 ) 

.01033 605 ( 07 ) 

.01791 240 ( 37 ) 

21 

.00183 461 ( 59 ) 

.00377 015 ( 01 ) 

.00714 742 ( 96 ) 

.01205 042 ( 13 ) 

22 

.00121 087 ( 33 ) 

.00254 041 ( 40 ) 

.00491 586 ( 73 ) 

.00887 897 ( 75 ) 

23 

.00079 647 ( 86 ) 

.00170 458 ( 70 ) 

.00336 424 ( 63 ) 

.00619 629 ( 64 ) 

24 

.00052 225 ( 81 ) 

.00113 935 ( 12 ) 

.00229 179 ( 12 ) 

. 00430 131 ( 09 ) 

25 

.00034 145 ( 46 ) 

.00075 880 ( 38 ) 

.00155 455 ( 79 ) 

.00297 118 ( 41 ) 

26 

.00022 264 ( 24 ) 

.00050 366 ( 86 ) 

.00105 029 ( 97 ) 

.00204 298 ( 97 ) 

27 

.00014 480 ( 76 ) 

.00033 340 ( 23 ) 

.00070 698 ( 65 ) 

. 001 39 889 ( 00 ) 

28 

.00009 396 ( 27 ) 

.00021 987 ( 94 ) 

.00047 424 ( 85 ) 

. 00095 385 ( 41 ) 

29 

.00006 083 ( 69 ) 

.00014 468 ( 69 ) 

.00031 709 ( 81 ) 

.00064 804 ( 12 ) 

30 

.00003 930 ( 84 ) 

.00009 495 ( 06 ) 

.00021 137 ( 85 ) 

.00043 871 ( 26 ) 

40 

.00000 045 ( 34 ) 

.00000 125 ( 87 ) 

.00000 320 ( 16 ) 

.00000 759 ( 84 ) 

50 

.00000 000 ( 47 ) 

.00000 001 ( 44 ) 

.00000 004 ( 09 ) 

.00000 010 ( 77 ) 

60 

.00000 000 ( 00 ) 

.00000 000 ( 02 ) 

.00000 000 ( 05 ) 

.00000 000 ( 13 ) 

70 

.00000 000 ( 00 ) | 

.00000 000 ( 00 ) 

.00000 000 ( 00 ) 

.00000 000 ( 00 ) 



CHI-FUNCTION FOR PF, ARSON CHI TEST 431 


X 2 

n= 11 

n - 

= 12 

n = 13 

n = 14 

1 

.99982 788(44) 

.99994 

961(00) 

.99998 583(51) 

.09999 616(52) 

2 

.99634 015(31) 

.99849 

588(16) 

.99940 581(51) 

.99977 374(98) 

3 

.98142 400(38) 

.99072 

588(63) 

.99554 401(93) 

.99793 431(73) 

4 

.94734 698(27) 

.96991 

702(37) 

.98343 639(15) 

.99119 138(03) 

5 

.89117 801(89) 

.93116 

661(10) 

.95797 896(18) 

.97519 313(39) 

6 

.81526 324(46) 

.87336 

425(39) 

. 9i(>08 205(80) 

.94015 290(01) 

7 

.72544 495(35) 

.79908 

350(16) 

.85761 355(34) 

.90215 150(16) 

8 

.62883 693(51) 

.71330 

382(93) 

.78513 038(09) 

.81300 027(48) 

9 

.53210 357(63) 

.62189 

233(10) 

.70293 043(47) 

.77294 353(83) 

10 

.44049 328(51) 

.53038 

714(13) 

.61596 005(48) 

.09393 435(82) 

11 

.35751 800(24) 

.44326 

327(82) 

.52891 868(04) 

.01081 701(97) 

12 

.28505 650(03) 

. 36364 

322(05) 

.44507 964(13) 

.52761 385(54) 

13 

.22367 181(68) 

.29332 

540(93) 

.36904 008(30) 

.44781 107(41) 

14 

.17299 100(79) 

.23299 

347(74) 

.30070 827(62) 

.37384 397(60) 

15 

.13206 185(63) 

. 18249 

692(96) 

.24143 645(10) 

.30735 277(37) 

16 

.09903 240(69) 

.14113 

086(91) 

.19123 007(53) 

.24912 983(01) 

17 

.07430 397(98) 

. 10787 

558(68) 

.14959 731(00) 

.19930 407(58) 

18 

.05496 364(15) 

.08158 

061(36) 

.11509 052(09) 

.15751 940(23) 

19 

.04026 268(23) 

.06109 

350(92) 

.08852 844(83) 

. 12310 300(09) 

20 

.02925 268(81) 

.04534 

067(37) 

.06708 590(29) 

.09521 025(54) 

21 

.02109 356(56) 

.03337 

105(44) 

.05038 045(10) 

.07292 802(05) 

22 

.01510 460(07) 

.02437 

324(38) 

.03751 981(41) 

.05530 177(04) 

23 

.01074 057(84) 

.01767 

510(94) 

.02772 594(22) 

.04107 020(37) 

24 

.00700 039(07) 

.01273 

320(34) 

.02034 102(90) 

.03113 005(98) 

25 

.00534 550(55) 

.00911 

668(47) 

.01482 287(47) 

.02308 373(18) 

26 

.00374 018(59) 

.00618 

991(72) 

.01073 388(99) 

.01700 083(08) 

27 

.00260 434(03) - 

.00459 

532(06) 

.00772 719(57) 

.01244 118(45) 

28 

.00180 524(88) 

.00323 

733(11) 

.00553 204(90) 

.00904 981(79) 

29 

.00121 604(48) 

.00226 

996(07) 

.00393 999(04) 

.00054 593(03) 

30 

. 00085 664(12) 

.00158 

458(60) 

.00279 242(92) 

.00470 909(53) 

40 

.00001 694(26) 

.00003 

577(50) 

.00007 190(08) 

.00013 823(54) 

50 

. 00000 026(69) 

.00000 

062(59) 

.00000 139(71) 

. (MK)00 298(14) 

60 

00000 000(36) 

. 00000 

000(93) 

. 00000 002(20) 

.00000 005(25) 

70 

.00000 000(00) 

.00000 

000(01) j 

.00000 000(03) 

.00000 000(08) 



43 2 


SOCIAL STATISTICS 


X* 

n = 15 

n = 16 

71 = 17 

71 = 18 

1 

.99999 899 ( 76 ) 

.99999 974 ( 64 ) 

.99999 993 ( 78 ) 

.99999 998 ( 51 ) 

2 

.99991 675 ( 88 ) 

.99997 034 ( 49 ) 

.99998 975 ( 08 ) 

.99999 655 ( 76 ) 

3 

.99907 400 ( 81 ) 

.99959 780 ( 14 ) 

.99983 043 ( 43 ) 

.99993 049 ( 82 ) 

4 

.99546 619 ( 45 ) 

.99773 734 ( 40 ) 

.99890 328 ( 10 ) 

.99948 293 ( 27 ) 

5 

.98581 268 ( 80 ) 

.99212 641 ( 19 ) 

.99575 330 ( 45 ) 

.99777 083 ( 79 ) 

6 

.96649 146 ( 48 ) 

.97974 774 ( 76 ) 

.98809 549 ( 63 ) 

.99318 566 ( 26 ) 

7 

.93471 190 ( 33 ) 

.95764 974 ( 76 ) 

.97326 107 ( 83 ) 

.98354 890 ( 12 ) 

8 

.88932 602 ( 14 ) 

.92378 270 ( 28 ) 

.94886 638 ( 40 ) 

.96654 676 ( 94 ) 

9 

.83105 057 ( 86 ) 

.87751 745 ( 11 ) 

.91341 352 ( 82 ) 

.94026 179 ( 87 ) 

10 

.76218 346 ( 30 ) 

.81973 990 ( 96 ) 

.86662 832 ( 59 ) 

.90361 027 ( 73 ) 

11 

.68603 598 ( 02 ) 

.75259 437 ( 02 ) 

.80948 528 ( 25 ) 

.85656 398 ( 72 ) 

12 

.60630 278 ( 23 ) 

.67902 905 ( 67 ) 

.74397 976 ( 03 ) 

.80013 721 ( 78 ) 

13 

.52652 362 ( 26 ) 

.60229 793 ( 88 ) 

.67275 778 ( 02 ) 

.73618 603 ( 49 ) 

14 

.44971 105 ( 59 ) 

.52552 912 ( 95 ) 

.59871 383 ( 57 ) 

.66710 193 ( 89 ) 

15 

.37815 469 ( 44 ) 

.45141 720 ( 81 ) 

.52463 852 ( 65 ) 

.59548 164 ( 24 ) 

16 

.31337 429 ( 98 ) 

.38205 162 ( 82 ) 

.45296 084 ( 21 ) 

.52383 487 ( 84 ) 

17 

.25617 786 ( 12 ) 

.31886 440 ( 74 ) 

.38559 710 ( 17 ) 

.45436 611 ( 65 ) 

18 

.20678 083 ( 99 ) 

.26266 556 ( 05 ) 

.32389 696 ( 44 ) 

.38884 087 ( 72 ) 

19 

.16494 924 ( 43 ) 

.21373 388 ( 26 ) 

.26866 318 ( 18 ) 

.32853 216 ( 35 ) 

20 

.13014 142 ( 10 ) 

.17193 268 ( 88 ) 

.22022 064 ( 68 ) 

.27422 926 ( 67 ) 

21 

.10163 250 ( 05 ) 

.13682 931 ( 99 ) 

.17851 057 ( 49 ) 

.22629 029 ( 06 ) 

22 

.07861 437 ( 21 ) 

. 10780 390 ( 86 ) 

.14319 153 ( 47 ) 

.18471 903 ( 57 ) 

23 

.06026 972 ( 28 ) 

.08413 984 ( 45 ) 

.11373 450 ( 53 ) 

.14925 066 ( 84 ) 

24 

.04582 230 ( 72 ) 

.06509 348 ( 69 ) 

.08950 449 ( 75 ) 

.11943 497 ( 03 ) 

25 

.03456 739 ( 39 ) 

.04994 343 ( 75 ) 

.06982 546 ( 38 ) 

.09470 961 ( 38 ) 

26 

.02588 691 ( 53 ) 

.03802 267 ( 61 ) 

.05402 824 ( 82 ) 

.07446 053 ( 08 ) 

27 

.01925 362 ( 03 ) 

.02873 644 ( 02 ) 

.04148 315 ( 34 ) 

.05806 790 ( 06 ) 

28 

.01422 795 ( 80 ) 

.02156 902 ( 04 ) 

.03161 977 ( 49 ) 

.04493 819 ( 83 ) 

29 

.01045 035 ( 87 ) 

.01608 463 ( 15 ) 

.02393 612 ( 18 ) 

.03452 612 ( 06 ) 

30 

.00763 189 ( 92 ) 

.01192 148 ( 60 ) 

.01800 219 ( 20 ) 

.02634 506 ( 73 ) 

40 

.00025 512 ( 04 ) 

.00045 339 ( 40 ) 

.00077 858 ( 80 ) 

.00129 409 ( 44 ) 

50 

.00000 610 ( 63 ) 

.00001 204 ( 12 ) 

.00002 292 ( 48 ) 

.00004 224 ( 03 ) 

60 

.00000 018 ( 95 ) 

.00000 025 ( 22 ) 

.00000 059 ( 55 ) 

.00000 105 ( 09 ) 

70 

.00000 000 ( 19 ) 

.00000 000 ( 37 ) 

.00000 001 ( 00 ) 

.00000 002 ( 16 ) 






CHI-FUNCTION FOR PEARSON CHI TEST 433 

X* »=19 n= 20 n- 21 » = 22 


1 .99999 999(66) 

2 .99999 887(48) 

3 .99997 226(42) 

4 .99976 255(27) 

5 .99885 974(71) 

6 .99619 700(81) 

7 .99012 634(23) 

8 .97863 656(53) 

9 .96974 268(74) 

10 .93190 636(53) 

11 .89435 667(78) 

12 .84723 749(38) 

13 .79157 303(33) 

14 .72909 126(79) 

15 .66196 711(92) 

16 .59254 738(44) 

17 .52310 504(49) 

18 .45565 260(45) 

19 .39182 348(26) 

20 .33281 967(91) 

21 .27941 304(74) 

22 .23198 513(32) 

23 . 19059 013(01) 

24 .15502 778(29) 

25 .12491 619(79) 

26 . 09975 791(41) 

27 .07899 549(06) 

28 .06205 545(45) 

29 .04837 906(72) 

30 . 03744 649(10) 

40 . 00208 725(70) 

50 . 00007 548(26) 

60 . 00000 211(82) 

70 . 00000 004(52) 


.99999 999(92) .99999 
.99999 964(15) .99999 
.99998 920(94) .99999 
.99989 365(95) .99995 
.99943 096(32) .99972 

.99792 845(61) .99889 
.99421 325(85) .99668 
.98667 098(89) .99186 
.97347 939(45) .98290 
.95294 578(77) .96817 

.92383 844(53) .94622 
.88562 533(15) .91607 
.83857 104(69) .87738 
.78369 131(12) .83049 
.72259 731(97) .77640 

.65727 793(65) .71662 
.58986 782(45) .65297 
.52243 827(24) .58740 
.45683 612(43) .52182 
.39457 818(17) .45792 

.33680 090(00) .39713 
.28425 625(90) .34051 
.23734 178(30) .28879 
.19615 235(87) .24239 
.16054 222(60) .20143 

.13018901(46) .16581 
.10465316(12) .13526 
.08342 860(90) .10939 
.06598 513(15) .08775 
.05179 844(62) .06985 

.00327 221(30) .00499 
.00013 106(12) .00022 
.00000 386(98) .00000 
.00000 009(19) .00000 


999(98) .99999 999(99) 

988(85) .99999 996(61) 

590(25) .99999 847(96) 

350(19) .99998 012(83) 

264(79) .99986 783(83) 

751(20) .99942 618(03) 

505(61) .99814 223(22) 

775(69) .99514 434(45) 

726(70) .98921 404(51) 

194(28) .97891 184(68) 

253(05) .96278 681(57) 

598(28) .93961 782(44) 

404(94) .90862 395(00) 

593(74) .86959 927(03) 

761(31) .82295 180(17) 

431(09) .76965 103(81) 

365(78) .71110 620(38) 

824(45) .64900 422(58) 

602(24) .58514 008(51) 

971(48) .52126 125(02) 

259(87) .45894 420(52) 

068(25) .39950 988(60) 

453(95) .34397 839(55) 

216(34) .29305 853(34) 

110(65) .24716 408(41) 

187(60) .20644 904(49) 

399(63) . 17085 326(84) 

984(50) .14015 131(95) 

938(83) .11400 151(65) 

365(61) .09198 799(17) 

541(03) .00743 667(32) 

147(66) .00036 480(05) 

719(39) .00001 277(17) 

018(21) .00000 035(14) 



434 


SOCIAL STATISTICS 


X 2 

n — 23 

n = 24 

n— 25 

n— 26 

] 

.99999 999(99) 

.99999 999(99) 

.99999 999(99) 

.99999 999(f 

2 

.99999 998(99) 

.99999 999(70) 

.99999 999(91) 

.99999 999 (t 

3 

.99999 944(83) 

.99999 980(39) 

.99999 993(18) 

.90900 097(1 

4 

.99999 169(18) 

.99999 659(85) 

.99999 863(54) 

.00900 046 (5 

5 

.99993 837(31) 

.99997 185(62) 

.99998 740(15) 

.99999 446 (i 

6 

.99970 766(32) 

.99985 410(16) 

.90092 861(35) 

.99996 573 ('. 

7 

.99898 060(00) 

.99945 189(02) 

.00971 100(82) 

.90985 048(1 

8 

.99716 023(36) 

.99837 228(95) 

.99908 477(06) 

.00049 505 a 

9 

.99333 132(78) 

.99595 746(68) 

.00750 571(63) 

.00850 619C; 

10 

.98630 473(15) 

.99127 663(54) 

.00454 600(82) 

.99665 263 (( 

11 

.97474 874(95) 

.98318 834(31) 

.08001 185(00) 

.99294 559 (i 

12 

.95737 907(02) 

.97047 067(75) 

.07000 803(63) 

.08656 7810 

14 

.93310 120(99) 

.95199 003(28) 

.06612 014(11) 

.07650 1290 

14 

.90147 920(01) 

.92687 124(27) 

.04665 037(70) 

.96173 244(1 

15 

.86223 798(36) 

.89463 357(45) 

.02075 860(07) 

.04138 2550 

10 

.81588 585(21) 

.85526 863(92) 

.88807 606(30) 

.01482 8700 

17 

.70336 197(88) 

.80925 155(83) 

.81866 204(50) 

.88170 377 (( 

18 

.70598 832(06) 

.75748 932(86) 

.80300 838(20) 

.84230 071 (5 

19 

.04532 843(52) 

.70122 462(06) 

.75108 060(00) 

.70712 0540 

20 

.58303 975(06) 

.64191 179(15) 

.60677 614(68) 

.74682 5300 

21 

.52073 812(75) 

.58108 751(03) 

.63872 522(33) 

.60260 9650 

22 

.45988 878(07) 

.52025 178(10) 

.57926 689(09) 

.63574 4020 

23 

.10172 961(04) 

.46077 087(57) 

.51070 800(34) 

.57756 335 (f 

24 

.34722 942(00) 

.40380 844(65) 

.46150 733(63) 

.51037 3570 

25 

.29707 473(13) 

.35028 534(37) 

.40576 068(10) 

.46237 3660 

20 

.25168 202(05) 

.30086 622(54) 

.35316 403(16) 

.40750 860 (C 

27 

.21122 047(90) 

.25596 769(19) 

.30445 316(24) 

.35588 462(3 

28 

.17568 199(16) 

.21578 160(01) 

.26004 108(74) 

.30785 324(0 

29 

.14486 085(38) 

.18030 985(77) 

.22013 006(75) 

.26301 602(7 

30 

.11846 440(38) 

.14940 102(81) 

.18475 178(70) 

.22428 807 (C 

40 

.01081 171(68) 

.01536 897(83) 

.02138 681(05) 

.02916 420(1 

50 

.00058 640(16) 

.00092 132(26) 

.00141 597(28) 

.00213 115(3 

00 

.00002 242(10) 

.00003 820(56) 

.00006 304(92) 

.00010 455(4 

70 

.00000 066(14) 

.00000 121(61) 

. 00000 218(65) 

.00000 384(7 



CHI-FUNCTION FOR l’KARSON CHI TEST 


435 


X 2 

n — 27 

n = 28 

n = 29 

n = 30 

1 

.99999 999(99) 

.99999 999(99) 

.99999 999(99) 

.09990 999(99) 

2 

.99999 999(99) 

.99999 999(99) 

.99999 999(99) 

.99999 999(99) 

3 

.99999 999(22) 

.99999 999(74) 

.99999 999(92) 

.99999 999(97) 

4 

.99999 979(27) 

.99999 992(12) 

.99999 997(07) 

.99999 998(91) 

5 

.99999 771(58) 

.99999 899(13) 

.99999 968(01) 

.99999 982(88) 

6 

.99998 385(11) 

.99999 252(42) 

.99999 059(82) 

.99999 847(85) 

7 

.99992 401(22) 

.99996 208(73) 

.99998 139(75) 

.99999 102(21) 

8 

.99972 628(29) 

.99985 433(73) 

.99992 307(13) 

.99996 079(19) 

9 

.99919 486(20) 

.99954 613(99) 

.99974 841(25) 

.99986 278(76) 

10 

.99798 114(85) 

.99880 302(90) 

.99930 201(01) 

.99959 947(28) 

11 

.99554 911(75) 

.99723 878(63) 

.99831 488(07) 

.99898 786(41) 

12 

.99117 251(63) 

.99429 444(57) 

.99037 150(71) 

.99772 850(24) 

13 

.98397 335(80) 

.98924 715(43) 

.99289 981(04) 

.99538 404(86) 

14 

.97300 022(67) 

.98125 471(54) 

.98718 800(74) 

.99137 737(52) 

15 

.95733 413(26) 

.96943 194(61) 

.97843 534(91) 

.98501 494(02) 

16 

.93620 287(18) 

.95294 715(46) 

.96581 930(89) 

.97553 586(27) 

17 

.90908 299(53) 

.93112 248(54) 

.94858 895(54) 

.96218 130(19) 

18 

.87577 342(96) 

.90351 971(04) 

.92014 923(12) 

.94427 237(51) 

10 

.83642 970(66) 

.87000 144(09) 

.89813 593(12) 

.92128 799(99) 

20 

.79155 647(69) 

.83075 611(69) 

.86440 442(32) 

.89292 708(80) 

21 

.74196 393(21) 

.78628 826(28) 

.82534 904(31) 

.85914 939(95) 

22 

.68869 681(98) 

.73737 720(58) 

.78129 137(50) 

.82018 942(45) 

23 

.03294 705(64) 

.68501 243(77) 

.73304 030(98) 

.77654 313(69) 

24 

.57596 525(26) 

.63031 609(48) 

.08153 503(09) 

.72893 166(96) 

25 

.51897 521(19) 

.57446 199(50) 

.62783 533(79) 

.67824 748(16) 

20 

.46310 474(55) 

.51860 045(36) 

.57304 455(93) 

.62549 104(05) 

27 

.40933 318(11) 

.46379 491(03) 

.51824 704(67) 

.57170 519(67) 

2S 

.35846 003(25) ' 

.41097 348(97) 

.40444 900(50) 

.51791 300(14) 

29 

.31108 235(48) 

.36089 918(32) 

.41252 813(30) 

.46506 627(69) 

30 

.26761 101(60) 

.31415 380(21) 

.30321 781(87) 

.41400 360(46) 

40 

.03901 199(08) 

.05123 679(26) 

.00012 703(88) 

.08393 679(44) 

50 

.00314 412(10) 

.00455 081(48) 

.00040 748(31) 

.00903 166(94) 

GO 

.00016 776(98) 

.00026 379(32) 

.00040 735(59) 

.00061 765(60) 

70 

.00000 663(45) 

.00001 121(69) 

.00001 801(00) 

.00003 032(18) 



APPENDIX B 


TABLE CXXIV 

Table of Squares, Square Roots, and Reciprocals, i to 


No. 

Square 

Square Root 

Reciprocal 


No. 

Square 

Square Root 

Reciprocal 

1 

1 

1.0000000 

1.000000000 


51 

26 01 

7.1414284 

.019607843 

2 

i 4 

1.4142136 

0.500000000 


52 

27 04 

7.2111026 

.019230769 

3 

9 

1.7320508 

.333333333 


53 

28 09 

7.2801099 

.018867925 

4 

16 

2.0000000 

.250000000 


54 

29 16 

7.3484692 

.018518519 

5 

25 

2 . 2360680 

.200000000 


55 

30 25 

7.4161985 

.018181818 

6 

36 

2.4494897 

. 166666667 


56 

31 36 

7.4833148 

.017857143 

7 

49 

2.6457513 

. 142857143 


57 

32 49 

7.5498344 

.017543860 

8 

64 

2.8284271 

.125000000 


58 

33 64 

7.6157731 

.017241379 

9 

81 

3.0000000 

.111111111 


59 

34 81 

7.6811457 

.016949153 

10 

100 

3.1622777 

.100000000 


60 

36 00 

7.7459667 

.016666667 

11 

121 

3.3166248 

.090909091 


61 

37 21 

7.8102497 

.016393443 

12 

144 

3.4641016 

.083333333 


62 

38 44 

7.8740079 

.016129032 

13 

169 

3.6055513 

.076923077 


63 

39 69 

7.9372539 

.015873016 

14 

1 96 

3.7416574 

.071428571 


64 

40 96 

8.0000000 

.015625000 

15 

2 25 

3.8729833 

.066666667 


65 

42 25 

8.0622577 

.015384615 

16 

2 56 

4.0000000 

.062500000 


66 

43 56 

8.1240384 

.015151515 

17 

2 89 

4.1231056 

.058823529 


67 

44 89 

8.1853528 

.014925373 

18 

3 24 

4.2426407 

.055555556 


68 

46 24 

8.2462113 

.014705882 

19 

3 61 

4.3588989 

.052631579 


69 

47 61 

8.3066239 

.014492754 

20 

400 

4.4721360 

.050000000 


70 

49 00 

8.3666003 

.014285714 

21 

4 41 

4.5825757 

.047619048 


71 

50 41 

8.4261498 

.014084507 

22 

4 84 

4.6904158 

.045454545 


72 

51 84 

8.4852814 

.013888889 

23 

529 

4.7958315 

.043478261 


73 

53 29 

8.5440037 

.013698630 

24 

5 76 

4.8989795 

.041666667 


74 

54 76 

8.6023253 

.013513514 

25 

6 25 

5.0000000 

.040000000 


75 

56 25 

8.6602540 

.013333333 

26 

6 76 

5.0990195 

.038461538 


76 

57 76 

8.7177979 

.013157895 

27 

7 29 

5.1961524 

.037037037 


77 

59 29 

8.7749644 

.012987013 

28 

784 

5.2915026 

.035714286 


78 

60 84 

8.8317609 

.012820513 

29 

8 41 

5.3851648 

.034482759 


79 

62 41 

8.8881944 

.012658228 

30 

900 

5.4772256 

.033333333 


80 

64 00 

8.9442719 

.012500000 

31 

9 61 

5.5677644 

.032258065 


81 

65 61 

9.0000000 

.012345679 

32 

10 24 

5.6568542 

.031250000 


82 

67 24 

9.0553851 

.012195122 

33 

10 89 

5.7445626 

.030303030 


83 

68 89 

9.1104336 

.012048193 

34 

11 56 

5.8309519 

.029411765 


84 

70 56 

9.1651514 

.011904762 

35 

12 25 

5.9160793 

.028571429 


85 

72 25 

9.2195445 

.011764706 

36 

12 96 

6.0000000 

.027777778 


86 

73 96 

9.2736185 

.011627907 

37 

13 69 

6.0827625 

.027027027 


87 

75 69 

9.3273791 

.011494253 

38 

14 44 

6.1644140 

.026315789 


88 

77 44 

9.3808315 

.011363636 

39 

15 21 

6.2449980 

.025641026 


89 

79 21 

9.4339811 

.011235955 

40 

16 00 

6.3245553 

.025000000 


90 

81 00 

9.4868330 

.011111111 

41 

16 81 

6.4031242 

.024390244 


91 

82 81 

9.5393920 

.010989011 

42 

17 64 

6.4807407 

.023809524 


92 

84 64 

9.5916630 

.010869565 

43 

18 49 

6.5574385 

.023255814 


93 

86 49 

9.6436508 

.010752688 

44 

19 36 

6.6332496 

.022727273 


94 

88 36 

9.6953597 

.010638298 

45 

20 25 

6.7082039 

.022222222 


95 

90 25 

9.7467943 

.010526316 

46 

21 16 

6.7823300 

.021739130 


96 

92 16 

9.7979590 

.010416667 

47 

22 09 

6.8556546 

.021276596 


97 

94 09 

9.8488578 

.010309278 

48 

23 04 

6.9282032 

.020833333 


98 

96 04 

9.8994949 

.010204082 

49 

24 01 

7.0000000 

.020408163 


99 

98 01 

9.9498744 

.010101010 

50 

25 00 

7.0710678 

.020000000 


100 

100 00 

10.0000000 

.010000000 


1 The following ten tables from Chaddock, R. E., and Croxton, F. E., Exercises in Sta- 
tistical Method , by courtesy of Houghton Mifflin Company. 

436 








SQUARES, SQUARE ROOTS, AND RECIPROCALS 437 


Square Root Reciprocal 


1 21 00 
1 23 21 
125 44 


1 96 00 

1 98 81 

2 01 64 

2 04 49 
2 07 36 
210 25 


10.0498756 

10.0995049 

10.1488916 

10.1980390 

10.2469508 

10.2956301 

10.3440804 

10.3923048 

10.4403065 

10.4880885 

10.5356538 

10.5830052 

10.6301458 

10.6770783 

10.7238053 

10.7703296 

10.8166538 

10.8627805 

10.9087121 

10.9544512 

11.0000000 

11.0453610 

11.0905365 

11.1355287 

11 . 1803399 
11.2249722 
11.2694277 

11.3137085 

11.3578167 

11.4017543 

11.4455231 

11.4891253 

11.5325626 

11.5758369 

11.6189500 

11.6619038 

11.7046999 

11.7473401 

11.7898261 

11.8321596 

11.8743422 

11.9163753 

11.9582607 

12.0000000 

12.0415946 

12.0830460 

12.1243557 

12.1655251 

12.2065556 

12.2474487 


9900990 

9803922 

9708738 

9615385 

9523810 

9433962 

9345794 

9259259 

9174312 

9090909 

9009009 

8928571 

8849 >58 
8771930 
8695652 

8620690 

8547009 

8474576 

8403361 

8333333 

8264463 

8196721 

8130081 

8064516 

8000000 

7936508 

7874016 

7812500 

7751938 

7692308 

7633588 

7575758 

7518797 

7462687 

7407407 

7352941 

7299270 

7246377 

7194245 

7142857 
! 7092199 
7042254 

6993007 

6944444 

6896552 

6849315 

6802721 

6756757 

6711409 

6666667 


No. Square Square Root Reciprocal 

151 2 28 01 12.2882057 6622517 

152 2 31 04 12 . 3288280 6578947 

153 2 34 09 12.3693169 6535948 

154 2 37 16 12.4096736 6493506 

155 2 40 25 12.4498996 6451613 

156 2 43 36 12.4899960 6410256 

157 2 46 49 12.5299641 6359427 

158 2 49 64 12.5698051 6329114 

159 2 52 81 12.6095202 6289308 

160 2 56 00 12 . 6491 106 6250000 . 

161 2 59 21 12.6885775 6211180 

162 2 62 44 12.7279221 6172840 

163 2 65 69 12.7671453 6134969 

164 2 68 96 12.8062485 6097561 

165 2 72 25 12.8452326 6060606 

166 2 75 56 12.-8840987 6024096 

167 2 78 89 12.9228480 5988024 

168 2 82 24 12.9614814 5952381 

169 2 85 61 13.0000000 5917160 

170 2 89 00 13.0384048 5882353 

171 2 92 41 13.0766968 5847953 

172 2 95 84 13.1148770 5813953 

173 2 99 29 13.1529464 5780347 

174 3 02 76 13.1909060 5747126 

175 3 06 25 13.2287566 5714286 

176 3 09 76 13.2664992 5681818 

177 3 13 29 13.3041347 5649718 

178 3 16 84 13.3416641 5617978 

179 3 20 41 13.3790882 5586592 

180 3 24 00 13.4164079 5555556 

181 3 27 61 13.4536240 5524862 

182 3 31 24 13.4907376 5494505 

183 3 34 89 13.5277493 5464481 

184 3 38 56 13.5646600 5434783 

185 3 42 25 13.6014705 5405405 

186 3 45 96 13.6381817 5376344 

187 3 49 69 13.6747943 5347594 

188 3 53 44 13.7113092 5319149 

189 3 57 21 13.7477271 5291005 

190 3 6100 13.7840488 5263158 

191 3 64 81 13.8202750 5235602 

192 3 68 64 13.8564065 5208333 

193 3 72 49 13.8924440 5181347 

194 3 76 36 13.9283883 5154639 

195 3 80 25 13.9642400 5128205 

196 3 84 16 14.0000000 5102041 

[97 3 88 09 14.0356688 5076142 

198 3 92 04 14.0712473 5050505 

[99 3 96 01 14.1067360 5025126 

*00 4 00 00 14.1421356 5000000 


3 49 69 
3 53 44 
3 57 21 



43 » 


SOCIAL STATISTICS 


No. Square Square Root Reciprocal 

201 4 04 01 14.1774469 4975124 

202 4 08 04 14.2126704 4950495 

203 4 12 09 14.2478068 4926108 

204 4 16 16 14.2828569 4901961 

205 4 20 25 14.3178211 4878049 

206 4 24 36 14.3527001 4854369 

207 4 28 49 14.3874946 4830918 

208 4 32 64 14.4222051 4807692 

209 4 36 81 14.4568323 4784689 

210 4 41 00 14.4913767 4761905 

211 4 45 21 14.5258390 4739336 

212 4 49 44 14.5602198 4716981 

213 4 53 69 14.5945195 4694836 

214 4 57 96 14.6287388 4672897 

215 4 62 25 14.6628783 4651163 

216 4 66 56 14.6969385 4629630 

217 4 70 89 14.7309199 4608295 

218 4 75 24 14.7648231 4587156 

219 4 79 61 14.7986486 4566210 

220 4 84 00 14.8323970 4545455 

221 4 88 41 14.8660687 4524887 

222 4 92 84 14.8996644 4504505 

223 4 97 29 14.9331845 4484305 

224 5 01 76 14.9666295 4464286 

225 5 06 25 15 . 0000001 ) 4444444 

226 5 10 76 15.0332964 4424779 

227 5 15 29 15.0665192 4405286 

228 519 84 15.0996689 4385965 

229 5 24 41 15.1327460 4366812 

230 5 29 00 15.1057509 4347826 

231 5 33 61 15.1986842 4329004 

232 5 38 24 15.2315462 4310345 

233 5 42 89 15.2043375 4291845 

2,34 5 47 56 15.2970585 4273504 

235 5 52 25 15.3297097 4255319 

236 5 56 90 15.3022915 4237288 

237 5 61 09 15 . 3948043 4219409 

238 5 66 44 15.4272486 4201681 

239 5 71 21 15.4596248 4184100 

240 5 76 00 15.4919334 4166667 

241 5 80 81 15.5241747 4149378 

242 5 85 64 15.5563492 4132231 

243 5 90 49 15.5884573 4115226 

244 5 95 36 15.6204994 4098361 

245 6 00 25 15.6524758 4081633 

246 6 05 16 15.6843871 4065041 

247 610 09 15.7162336 4048583 

248 615 04 15.7480157 4032258 

249 6 20 01 15.7797338 *4016064 

250 6 25 00 15.8113883 4000000 


No. Square Square Root I R eci gg ocal 

251 6 30 01 15.8429795 3984064 

252 6 35 04 15.8745079 3968254 

253 6 40 09 15.9059737 3952569 

254 6 45 16 15.9373775 3937008 

255 6 50 25 15.9687194 3921569 

256 6 55 36 16.0000000 3906250 

257 6 60 49 16.0312195 3891051 

258 6 65 64 16.0623784 3875969 

259 6 70 81 16.0934769 3861004 

260 6 76 00 16.1245155 3846154 

261 681 21 16.1554944 3831418 

262 6 86 44 16.1864141 3816794 

263 6 91 69 16.2172747 3802281 

264 6 96 96 16.2480768 3787879 

265 7 02 25 16.2788206 3773585 

266 7 07 56 16.3095064 3759398 

267 7 12 89 16.3401346 3745318 

268 7 18 24 16.3707055 3731343 

269 7 23 61 16.4012195 3717472 

270 7 29 00 16.4316767 3703704 

271 7 34 41 16.4620776 3690037 

272 7 39 84 16.4924225 3676471 

273 7 45 29 16.5227116 3663004 

274 7 50 76 16.5529454 3649635 

275 7 56 25 16.5831240 3636364 

276 7 61 76 10.6132477 3623188 

277 7 67 29 16.6433170 3610108 

278 7 72 84 16.6733320 3597122 

279 7 78 41 16.7032931 3584229 

280 7 84 00 16.7332005 3571429 

281 7 89 61 16.7630546 3558719 

282 7 95 24 16.7928556 3546099 

283 8 00 89 16.8226038 3533569 

284 8 06 56 16.8522995 3521127 

285 8 12 25 16.8819430 3508772 

286 8 17 96 16.9115345 3496503 

287 8 23 69 16.9410743 3484321 

288 8 29 44 16.9705627 3472222 

289 8 35 21 17.0000000 3460208 

290 8 41 00 17 . 0293864 3448276 

291 8 46 81 17.0587221 3436426 

292 8 52 64 17 . 0880075 3424658 

293 8 58 49 17.1172428 3412969 

294 8 64 36 17.1464282 3401361 

295 8 70 25 17.1755640 3389831 

296 8 76 16 17.2046505 3378378 

297 8 82 09 17.2336879 3367003 

298 8 88 04 17.2626765 3355705 

299 8 94 01 17.2916165 3344482 

300 9 00 00 17.3205081 3333333 



SQUARES, SQUARE ROOTS, AND RECIPROCALS 


439 


No. Square Square Root Reciprocal 

301 9 06 01 17.3493516 3322259 

302 9 12 04 17.3781472 3311258 

303 9 18 09 17.4068952 3300330 

304 9 24 16 17.4355958 3289474 

305 9 30 25 17.4642492 3278689 

306 9 36 36 17.4928557 3267974 

307 9 42 49 17.5214155 3257329 

308 9 48 64 17.5499288 3246753 

309 9 54 81 17.5783958 3236246 

310 9 61 00 17.6068169 3225806 

311 9 67 21 17.6351921 3215434 

312 9 73 44 17.6635217 3205128 

313 9 79 69 17.6918060 3194888 

314 9 85 96 17.7200451 3184713 

315 9 92 25 17.7482393 3174603 

316 9 98 56 17.7763888 3164557 

317 10 04 89 17.8044938 3154574 

318 10 11 24 17.8325545 3144654 

319 10 17 61 17.8605711 3134796 

320 10 24 00 17.8885438 3125000 

321 10 30 41 17.9164729 3115265 

322 10 36 84 17.9443584 3105590 

323 10 43 29 17.9722008 3095975 

324 10 49 76 18.0000000 3086420 

325 10 56 25 18.0277564 3076923 

326 10 62 76 18.0554701 3067485 

327 10 69 29 18.0831413 3058104 

328 10 75 84 18.1107703 3048780 

329 10 82 41 18.1383571 3039514 

330 10 89 00 18.1659021 3030303 

331 10 9561 18.1934054 3021148 

332 11 02 24 18.2208672 3012048 

333 1108 89 18.2482876 3003003 

334 11 15 56 18.2756669 2994012 

335 11 22 25 18.3030052 2985075 

336 11 28 96 18.3303028 2976190 

337 1135 69 18.3575598 2967359 

338 11 42 44 18.3847763 2958580 

339 11 49 21 18.4119526 2949853 

340 11 56 00 18.4390889 2941176 

341 11 62 81 18.4661853 2932551 

342 1169 64 18.4932420 2923977 

343 1176 49 18.5202592 2915452 

344 11 83 36 18.5472370 2906977 

345 11 90 25 18.5741756 2898551 

346 11 97 16 18.6010752 2890173 

347 12 04 09 18.6279360 2881844 

348 121104 18.6547581 2873563 

349 12 18 01 18.6815417 2865330 

350 12 25 00 18.7082869 2857143 






440 


SOCIAL STATISTICS 


No. Square Square Root Reciprocal 

401 16 08 01 20.0249844 2493766 

402 16 16 04 20.0499377 2487562 

403 16 24 09 20.0748599 2481390 

404 16 32 16 20.0997512 2475248 

405 16 40 25 20.1246118 2469136 

406 16 48 36 20.1494417 2463054 

407 16 56 49 20.1742410 2457002 

408 16 64 64 20.1990099 2450980 

409 16 72 81 20.2237484 2444988 

410 16 81 00 20.2484567 2439024 

411 16 89 21 20.2731349 2433090 

412 16 97 44 20.2977831 2427184 

413 17 05 69 20.3224014 2421308 

414 17 13 96 20.3469899 2415459 

415 17 22 25 20.3715488 2409639 

416 17 30 56 20.3960781 2403846 

417 17 38 89 20.4205779 2398082 

418 17 47 24 20.4450483 2392344 

419 17 55 61 20.4694895 2386635 

420 17 64 00 20.4939015 2380952 

421 17 72 41 20.5182845 2375297 

422 17 80 84 20.5426386 2369668 

423 17 89 29 20.5669638 2364066 

424 17 97 76 20.5912603 2358491 

425 18 06 25 20.6155281 2352941 

426 18 14 76 20.6397674 2347418 

427 18 23 29 20.6639783 2341920 

428 18 31 84 20.6881609 2336449 

429 18 40 41 20.7123152 2331002 

430 18 49 00 20.7364414 2325581 

431 18 57 61 20.7605395 2320186 

432 18 66 24 20.7846097 2314815 

433 18 74 89 20.8086520 2309469 

434 18 83 56 20.8326667 2304147 

435 18 92 25 20.8566536 2298851 

436 19 00 96 20.8806130 2293578 

437 19 09 69 20.9045450 2288330 

438 1918 44 20.9284495 2283105 

439 19 27 21 20.9523268 2277904 

440 19 36 00 20.9761770 2272727 

441 19 44 81 21.0000000 2267574 

442 19 53 64 21.0237960 2262443 

443 19 62 49 21.0475652 2257336 

444 19 71 36 21.0713075 2252252 

445 19 80 25 21.0950231 2247191 

446 19 89 16 21.1187121 2242152 

447 19 98 09 21.1423745 2237136 

448 20 07 04 21.1660105 2232143 

449 2016 01 21.1896201 2227171 

450 20 25 00 21.2132034 2222222 


No Square Square Root Reciprocal 

451 20 34 01 21.2367606 2217295 

452 20 43 04 21.2602916 2212389 

453 20 52 09 21.2837967 2207506 

454 20 61 16 21 . 3072758 2202643 

455 20 70 25 21.3307290 2197802 

456 20 79 36 21.3541565 2192982 

457 20 88 49 21.3775583 2188184 

458 20 97 64 21.4009346 2183406 

459 21 06 81 21.4242853 2178649 

460 21 16 00 21 . 4476106 2173913 

461 21 25 21 21.4709106 2169197 

462 21 34 44 21.4941853 2164502 

463 21 43 69 21.5174348 2159827 

464 21 52 9 > 21.5406592 2155172 

465 21 62 25 21.5638587 2150538 

466 21 71 56 21.5870331 2145923 

467 21 80 89 21.6101828 2141328 

468 21 90 24 21.6333077 2136752 

469 21 99 61 21.6564078 2132196 

470 22 09 00 21.6794834 2127660 

471 22 18 41 21.7025344 2123142 

472 22 27 84 21.7255610 2118644 

473 22 37 29 21.7485632 2114165 

474 22 46 76 21.7715411 2109705 

475 22 56 25 21.7944947 2105263 

476 22 65 76 21.8174242 2100840 

477 22 75 29 21.8403297 2096436 

478 22 84 84 21.8632111 2092050 

479 22 94 41 21.8860686 2087683 

480 23 04 00 21.9089023 2083333 

481 23 13 61 21.9317122 2079002 

482 23 23 24 21.9544984 2074689 

483 23 32 89 21.9772610 2070393 

484 23 42 56 22.0000000 2066116 

485 23 52 25 22.0227155 2061856 

486 23 61 96 22.0454077 2057613 

487 23 71 69 22 . 0680765 2053388 

488 23 81 44 22.0907220 2049180 

489 23 91 21 22.1133444 2044990 

490 24 01 00 22.1359436 2040816 

491 24 10 81 22.1585198 2036660 

492 24 20 64 22.1810730 2032520 

493 24 30 49 22.2036033 2028398 

494 24 40 36 22.2261108 2024291 

495 24 50 25 22.2485955 2020202 

496 24 60 16 22.2710575 2016129 

497 24 70 09 22.2934968 2012072 

498 24 80 04 22.3159136 2008032 

499 24 90 01 22.3383079 2004008 

500 25 00 00 22.3606798 2000000 







SQUARES, SQUARE ROOTS, AND RECIPROCALS 


441 


No. Square Square Root Reciprocal 

501 25 10 01 22.3830293 1996008 

502 25 20 04 22.4053565 1992032 

503 25 30 09 22.4276615 1988072 

504 25 40 16 22.4499443 1984127 

505 25 50 25 22.4722051 1980198 

506 25 60 36 22.4944438 1976285 

507 25 70 49 22.5166605 1972387 

508 25 80 64 22.5388553 1968504 

509 25 90 81 22.5610283 1964637 

510 26 01 00 22.5831796 1960784 

511 26 11 21 22.6053091 1956947 

512 26 21 44 22.6274170 1953125 

513 26 31 69 22.6495033 1949318 

514 26 41 96 22.6715681 1945525 

515 20 62 25 22.6933114 1941748 

516 26 62 56 22.7156334 1937984 

517 26 72 89 22.7376340 1934236 

518 26 83 24 22.7595134 1930502 

519 26 93 61 22.7815715 1926782 

520 27 04 00 22.8035085 1923077 

521 27 14 41 22.8254244 1919386 

522 27 24 84 22.8473193 1915709 

523 27 35 29 22.8091933 1912046 

524 27 45 76 22.8910463 1908397 

525 27 56 25 22.9128785 1904762 

526 27 66 76 22.9340899 1901141 

527 27 77 29 22.9564806 1897533 

528 27 87 84 22.9782506 1893939 

529 27 98 41 23.0090000 1890359 

530 28 09 00 23.0217289 1886792 

531 28 19 61 23.0434372 1883239 

532 28 30 24 23.0651252 1879699 

533 28 40 89 23.0867928 1876173 

534 28 51 56 23.1084400 1872659 

535 2S62 25 23.1300670 1869159 

536 28 72 96 23.1516738 1865672 

537 28 83 69 23.1732605 1862197 

538 28 94 44 23.1948270 1858736 

539 29 05 21 23.2163735 1855288 

540 29 16 00 23.2379001 1851852 

541 29 26 81 23.2594067 1848429 

542 29 37 64 23.2808935 1845018 

543 29 48 49 23.3023604 1841621 

544 29 59 36 23.3238076 1838235 

545 29 70 25 23.3452351 1834862 

546 29 81 16 23.3666429 1831502 

547 29 92 09 23.3880311 1828154 

548 30 03 04 23.4093998 1824818 

549 30 14 01 23.4307490 1821494 

550 30 25 00 23.4520788 1818182 


No. Square Square Root Reciprocal 

551 30 36 01 23.4733892 1814882 

552 30 47 04 23.4946802 1811594 

553 30 58 09 23.5159520 1808318 

554 30 69 16 23.5372046 1805054 

555 30 80 25 23.5584380 1801802 

556 30 91 36 23.5796522 1798561 

557 31 02 49 23.6008474 1795332 

558 31 13 64 23.6220236 1792115 

559 31 24 81 214.6431808 1788909 

560 31 36 00 23.6643191 1785714 

561 31 47 21 23.6854386 1782531 

562 31 58 44 23.7065392 1779359 

563 31 69 69 23.7276210 1770199 

564 31 80 96 23.7486842 1773050 

565 31 92 25 23.7697286 1769912 

566 32 03 56 23.7907545 1766784 

567 32 14 89 23.8117618 1763668 

568 32 26 24 23.8327506 1760563 

569 32 37 61 23.8537209 1757469 

570 32 49 00 23.8746728 1754386 

571 32 60 41 23.8956063 1751313 

572 32 71 84 23.9165215 1748252 

573 32 83 29 23.9374184 1745201 

574 32 94 76 23.9582971 1742160 

575 33 06 25 23.9791576 1739130 

576 33 17 70 24.0000000 1736111 

577 33 29 29 24.0208243 1733102 

578 33 40 84 24.0416306 1730104 

579 33 52 41 24.0624188 1727110 

580 33 64 00 24.0831891 1724138 

581 33 75 61 24.1039416 1721170 

582 33 87 24 24.1246762 1718213 

583 33 98 89 24.1453929 1715266 

584 34 10 56 24.1660919 1712329 

585 34 22 25 24.1867732 1709402 

586 34 33 96 24.2074369 1706485 

587 34 45 69 24.2280829 1703578 

588 34 57 44 24.2487113 1700680 

589 34 69 21 24.2693222 1697793 

590 34 81 00 24.2899156 1694915 

591 34 92 81 24.3104916 1692047 

592 35 04 64 24.3310501 1689189 

593 35 16 49 24.3515913 1686341 

594 35 28 36 24.3721152 1683502 

595 35 40 25 24.3926218 1680672 

596 35 52 16 24.4131112 1677852 

597 35 64 09 24.4335834 1675042 

598 35 76 04 24.4540385 1672241 

599 35 88 01 24.4744765 1669449 

600 36 00 00 24.4948974 1666667 



442 


SOCIAL STATISTICS 


No. 

Square 

Square Root 

Reciorocal 

.00 

601 

36 12 01 

24.5153013 

1663894 

602 

36 24 04 

24.5356883 

1661130 

603 

36 36 09 

24.5560583 

1658375 

604 

36 48 16 

24.5764115 

1655629 

605 

36 60 25 

24.5967478 

1652893 

606 

36 72 36 

24.6170673 

1650165 

607 

36 84 49 

24.6373700 

1647446 

608 

36 96 64 

24.6576560 

1644737 

609 

37 08 81 

24.6779254 

1642036 

610 

37 21 00 

24.6981781 

1639344 

611 

37 33 21 

24.7184142 

1636661 

612 

37 45 44 

24.7386338 

1633987 

613 

37 57 69 

24.7588368 

1631321 

614 

37 69 96 

24.7790234 

1628664 

615 

37 82 25 

24.7991935 

1626016 

616 

37 94 56 

24.8193473 

1623377 

617 

38 06 89 

24.8394847 

1620746 

618 

38 19 24 

24.8596058 

1618123 

619 

38 31 61 

24.8797106 

1615509 

620 

38 44 00 

24.8997992 

1612903 

621 

38 56 41 

24.9198716 

1610306 

622 

38 68 84 

24.9399278 

1607717 

623 

38 81 29 

24.9599679 

1605136 

624 

38 93 76 

24.9799920 

1602564 

625 

39 06 25 

25.0000000 

1600000 

626 

39 18 76 

25.0199920 

1597444 

627 

39 31 29 

25.0399681 

1594896 

628 

39 43 84 

25.0599282 

1592357 

629 

39 56 41 

25.0798724 

1589825 

630 

39 69 00 

25.0998008 

1587302 

631 

39 81 61 

25.1197134 

1584786 

632 

39 94 24 

25.1396102 

1582278 

633 

40 06 89 

25.1594913 

1579779 

634 

40 19 56 

25.1793566 

1577287 

635 

40 32 25 

25.1992063 

1574803 

636 

40 44 96 

25.2190404 

1572327 

637 

40 57 69 

25.2388589 

1569859 

638 

40 70 44 

25.2586619 

1567398 

639 

40 83 21 

25.2784493 

1564945 

640 

40 96 00 

25.2982213 

1562500 

641 

41 08 81 

25.3179778 

1560062 

642 

41 21 64 

25.3377189 

1557632 

643 

41 34 49 

25.3574447 

1555210 

644 

41 47 36 

25.3771551 

1552795 

645 

41 60 25 

25.3968502 

1550388 

646 

41 73 16 

25.4165301 

1547988 

647 

41 86 09 

25.4361947 

1545595 

648 

41 99 04 

25.4558441 

1543210 

649 

42 12 01 

25.4754784 

1540832 

650 

42 25 00 

25.4950976 

1538462 


No. 

Square 

Square Root 

Reciprocal 

.00 

651 

42 38 01 

25.5147016 

1536098 

652 

42 51 04 

25.5342907 

1533742 

653 

42 64 09 

25.5538647 

1531394 

654 

42 77 16 

25.5734237 

1529052 

655 

42 90 25 

25.5929678 

1526718 

656 

43 03 36 

25.6124969 

1524390 

657 

43 16 49 

25.6320112 

1522070 

658 

43 29 64 

25.6515107 

1519757 

659 

43 42 81 

25.6709953 

1517451 

660 

43 56 00 

25.6904652 

1515152 

661 

43 69 21 

25.7099203 

1512859 

662 

43 82 44 

25.7293607 

1510574 

663 

43 95 69 

25.7487864 

1508296 

664 

44 08 96 

25.7681975 

1506024 

665 

44 22 25 

25.7875939 

1503759 

666 

44 35 56 

25.8069758 

1501502 

667 

44 48 89 

25.8263431 

1499250 

668 

44 62 24 

25.8456960 

1497006 

669 

44 75 61 

25.8650343 

1494768 

670 

44 89 00 

25.8843582 

1492537 

671 

45 02 41 

25.9036677 

1490313 

672 

45 15 84 

25.9229628 

1488095 

673 

45 29 29 

25.9422435 

1485884 

674 

45 42 76 

25.9615100 

1483680 

675 

45 56 25 

25.9807621 

1481481 

676 

45 69 76 

26.0000000 

1479290 

677 

45 83 29 

26.0192237 

1477105 

678 

45 96 84 

26.0384331 

1474926 

679 

46 10 41 

26.0576284 

1472754 

680 

46 24 00 

26.0768096 

1470588 

681 

46 37 61 

26.0959767 

1468429 

682 

46 51 24 

26.1151297 

1466276 

683 

46 64 89 

26.1342687 

1464129 

684 

46 78 56 

26.1533937 

1461988 

685 

46 92 25 

26.1725047 

1459854 

686 

47 05 96 

26.1916017 

1457726 

687 

47 19 69 

26.2106848 

1455604 

688 

47 33 44 

26.2297541 

1453488 

689 

47 47 21 

26.2488095 

1451379 

690 

47 61 00 

26.2678511 

1449275 

691 

47 74 81 

26.2868789 

1447178 

692 

47 88 64 

26.3058929 

1445087 

693 

48 02 49 

26.3248932 

1443001 

694 

48 16 36 

26.3438797 

1440922 

695 

48 30 25 

26.3628527 

1438849 

696 

48 44 16 

26.3818119 

1436782 

697 

48 58 09 

26.4007576 

1434720 

698 

48 72 04 

26.4196896 

1432665 

699 

48 86 01 

26.4386081 

1430615 

700 

49 00 00 

26.4575131 

1428571 




SQUARES, SQUARE ROOTS, AND RECIPROCALS 443 


No. Square Square Root Reciprocal 

701 49 14 01 26.4764046 1426534 

702 49 28 04 26.4952826 1424501 

703 49 42 09 26.5141472 1422475 

704 49 56 16 26.5329983 1420455 

705 49 70 25 26.5518361 1418440 

706 49 84 36 26.5706605 1416431 

707 49 98 49 26.5894716 1414427 

708 50 12 64 26.6082694 1412429 

709 50 26 81 26.6270539 1410437 

710 50 41 00 26.6458252 1408451 

711 50 55 21 26.6645833 1406470 

712 50 69 44 26.6833281 1404494 

713 50 83 69 26.7020598 1402525 

714 50 97 96 26.7207784 1400560 

715 51 12 25 26.7394839 1398601 

716 51 26 56 26.7581763 1396648 

717 51 40 89 26.7768557 1394700 

718 51 55 24 26.7955220 1392758 

719 51 69 61 26.8141754 1390821 

720 51 84 00 26.8328157 1388889 

721 51 98 41 26.8514432 1386963 

722 52 12 84 26.8700577 1385042 

723 52 27 29 26.8886593 1383126 

724 52 41 76 26.9072481 1381215 

725 52 56 25 26.9258240 1379310 

726 52 70 76 26.9443872 1377410 

727 52 85 29 26.9629375 1375516 

728 52 99 84 26.9814751 1373626 

729 53 14 41 27.0000000 1371742 

730 53 29 00 27.0185122 1369863 

731 53 43 61 27.0370117 1367989 

732 53 58 24 27.0554985 1366120 

733 53 72 89 27.0739727 1364256 

734 53 87 56 27.0924344 1362398 

735 54 02 25 27.1108834 1360544 

736 54 16 96 27.1293199 1358696 

737 54 3169 27.1477439 1356852 

738 54 46 44 27.1661554 1355014 

739 54 61 21 27.1845544 1353180 

740 54 76 00 27.2029410 1351351 

741 54 90 81 27.2213152 1349528 

742 55 05 64 27.2396769 1347709 

743 55 20 49 27.2580263 1345S95 

744 55 35 36 27.2763034 1344086 

745 55 50 25 27.2946881 1342282 

746 55 65 16 27.3130006 1340483 

747 55 80 09 27.3313007 1338688 

748 55 95 04 27.3495887 1336898 

749 56 10 01 27.3678044 1335113 

750 56 25 00 27.3861279 1333333 


No. Square Square Root Ree>Pr°«»' 

751 56 40 01 27.4043792 1331558 

752 56 55 04 27.4226184 1329787 

753 56 70 09 27.4408455 1328021 

754 56 85 16 27.4590604 1326260 

755 57 00 25 27.4772633 1324503 

756 57 15 36 27.4954542 1322751 

757 57 30 49 27.5136330 1321004 

758 57 45 64 27.5317998 1319261 

759 57 60 81 27.5499546 1317523 

760 57 76 00 27.5680975 1315789 

761 57 9121 27.5862284 1314060 

762 58 06 44 27.6043475 1312336 

763 58 21 69 27.6224546 1310616 

764 58 36 96 27.6405499 1308901 

765 58 52 25 27.6586334 1307190 

766 58 67 56 27 . 6767050 1305483 

767 58 82 89 27.6947648 1303781 

768 58 98 24 27.7128129 1302083 

769 59 13 61 27.7308492 1300390 

770 59 29 00 27.7488739 1298701 

771 59 44 41 27.7668868 1297017 

772 59 59 84 27.7848880 1295337 

773 59 75 29 27.8028775 1293661 

774 59 90 76 27.8208555 1291990 

775 60 06 25 27.8388218 1290323 

776 60 21 06 27.8567766 1288660 

777 60 37 29 27.8747197 1287001 

778 60 52 84 27.8926514 1285347 

779 60 68 41 27.9105715 1283097 

780 60 84 00 27.9284801 1282051 

781 60 99 61 27 . 9403772 1280410 

782 61 15 24 27.9642629 1278772 

783 6130 89 27.9821372 1277139 

784 61 46 56 28.0000000 1275510 

785 61 62 25 28.0178515 1273885 

786 61 77 96 28.0356915 1272265 

787 61 93 69 28.0535203 1270648 

788 62 09 44 28.0713377 1269036 

789 62 25 21 28.0891438 1267427 

790 62 41 00 28.1069386 1265823 

791 62 56 81 28.1247222 1264223 

792 62 72 64 28.1424946 1262626 

793 62 88 49 28.1602557 1261034 

794 63 04 36 28.1780056 1259446 

795 63 20 25 28.1957444 1257862 

796 63 36 16 28.2134720 1256281 

797 63 52 09 28.21111884 1254705 

798 63 68 04 28.2488938 1253133 

799 63 84 01 28.2665881 1251564 

800 64 00 00 28.2842712 1250000 




444 


SOCIAL STATISTICS 


No. Square Square Root Reciprocal 

801 64 16 01 28.3019434 1248439 

802 64 32 04 28.3196045 1246883 

803 64 48 09 28.3372546 1245330 

804 64 64 16 28.3548938 1243781 

805 64 80 25 28.3725219 1242236 

806 64 96 36 28.3901391 1240695 

807 65 12 49 28.4077454 1239157 

808 65 28 64 28.4253408 1237624 

809 65 44 81 28.4429253 1236094 

810 65 61 00 28.4604989 1234568 

811 65 77 21 28.4780617 1233046 

812 65 93 44 28.4956137 1231527 

813 66 09 69 28.5131549 1230012 

814 66 25 96 28.5306852 1228501 

815 66 42 25 28.5482048 1226994 

816 66 58 56 28.5657137 1225490 

817 66 74 89 28.5832119 1223990 

818 66 91 24 28.6006993 1222494 

819 67 07 61 28.6181760 1221001 

820 67 24 00 28.6356421 1219512 

821 67 40 41 28.6530976 1218027 

822 67 56 84 28.6705424 1216545 

823 67 73 29 28.6879766 1215067 

824 67 89 76 28.7054002 1213592 

825 68 06 25 28.7228132 1212121 

826 68 22 76 28.7402157 1210654 

827 68 39 29 28.7576077 1209190 

828 68 55 84 28.7749891 1207729 

829 68 72 41 28.7923601 1206273 

830 68 89 00 28.8097206 1204819 

831 69 05 61 28.8270706 1203369 

832 69 22 24 28.8444102 1201923 

833 69 38 89 28.8617394 1200480 

834 69 55 56 28.8790582 1199041 

835 69 72 25 28.8963666 1197605 

836 69 88 96 28.9136646 1196172 

837 70 05 69 28.9309523 1194743 

838 70 22 44 28.9482297 1193317 

839 70 39 21 28.9654967 1191895 

840 70 56 00 28.9827535 1190476 

841 70 72 81 29.0000000 1189061 

842 70 89 64 29.0172363 1187648 

843 71 06 49 29.0344623 1186240 

844 71 23 36 29.0516781 1184834 

845 71 40 25 29.0688837 1183432 

846 71 57 16 29.0860791 1182033 

847 71 74 09 29.1032644 1180638 

848 71 91 04 29.1204396 1179245 

849 72 08 01 29.1376046 lf77856 

850 72 25 00 29.1547595 1176471 


No. Square Square Root Reciprocal 

851 72 42 01 29.1719043 1175088 

852 72 59 04 29.1890390 1173709 

853 72 76 09 29.2001637 1172333 

854 72 93 16 29.2232784 1170960 

855 73 10 25 29.2403830 1169591 

856 73 27 36 29.2574777 1168224 

857 73 44 49 29.2745623 1166861 

858 73 61 64 29.2916370 1165501 

859 73 78 81 29.3087018 1164144 

860 73 96 00 29.3257566 1162791 

861 74 13 21 29.3428015 1161440 

862 74 30 44 29.3598365 1160093 

863 74 47 69 29.3768616 1158749 

864 74 64 96 29.3938769 1157407 

865 74 82 25 29.4108823 1156069 

866 74 99 56 29.4278779 1154734 

867 75 16 89 29.4448637 1153403 

868 75 34 24 29.4618397 1152074 

869 75 51 61 29.4788059 1150748 

870 75 69 00 29.4957624 1149425 

871 75 86 41 29.5127091 1148106 

872 76 03 84 29.5296461 1146789 

873 76 21 29 29.5465734 1145475 

874 76 38 76 29.5634910 1144165 

875 76 56 25 29.5803989 1142857 

876 76 73 76 29.5972972 1141553 

877 76 91 29 29.6141858 1140251 

878 77 08 84 29.6310648 1138952 

879 77 26 41 29.6479342 1137656 

880 77 44 00 29.6647939 1136304 

881 77 61 61 29.6816442 1135074 

882 77 79 24 29.6984848 1133787 

883 77 96 89 29.7153159 1132503 

884 78 14 56 29.7321375 1131222 

885 78 32 25 29.7489496 1129944 

886 78 49 96 29.7657521 1128668 

887 78 67 69 29.7825452 1127396 

888 78 85 44 29.7993289 1126126 

889 79 03 21 29.8161030 1124859 

890 79 21 00 29.8328678 1123596 

891 79 38 81 29.8496231 1122334 

892 79 56 64 29.8663690 1121076 

893 79 74 49 29.8831056 1119821 

894 79 92 36 29.8998328 1118568 

895 80 10 25 29.9165506 1117318 

896 80 28 16 29.9332591 1116071 

897 80 46 09 29.9499583 1114827 

898 80 64 04 29.9666481 1113586 

899 80 82 01 29.9833287 1112347 

900 81 00 00 30.0000000 1111111 




SQUARES, SQUARE ROOTS, AND RECIPROCALS 4+5 


No. 

Square 

Square Root 

Reciprocal 

.00 





901 

81 18 01 

30.0166620 

1109878 

902 

81 36 04 

30.0333148 

1108647 

903 

81 54 09 

30.0499584 

1107420 

904 

81 72 16 

30.0665928 

1106195 

905 

81 90 25 

30.0832179 

1104972 

906 

82 08 36 

30.0998339 

1103753 

907 

82 26 49 

30.1164407 

1102536 

908 

82 44 64 

30.1330383 

1101322 

909 

82 62 81 

30.1496269 

1100110 

910 

82 81 00 

30.1662063 

1098901 

911 

82 99 21 

30.1827765 

1097695 

912 

83 17 44 

30.1993377 

1096491 

913 

83 35 69 

30.2158899 

1095290 

914 

83 53 96 

30.2324329 

1094092 

915 

83 72 25 

30.2489669 

1092896 

916 

83 90 56 

30.2654919 

1091703 

917 

84 08 89 

30.2820079 

1090513 

918 

84 27 24 

30.2985148 

1089325 

919 

84 45 61 

30.3150128 

1088139 

920 

84 64 00 

30.3315018 

1086957 

921 

84 82 41 

•30.3479818 

1085776 

922 

85 00 84 

30.3644529 

1084599 

923 

85 19 29 

30.3809151 

1083424 

924 

85 37 76 

30.3973683 

1082251 

925 

85 56 25 

30.4138127 

1081081 

926 

85 74 76 

30.4302481 

1079914 

927 

85 93 29 

30.4466747 

1078749 

928 

86 11 84 

30.4630924 

1077586 

929 

86 30 41 

30.4795013 

1076426 

930 

86 49 00 

30.4959014 

1075269 

931 

86 67 61 

30.5122926 

1074114 

932 

86 86 24 

30.5286750 

1072961 

933 

87 04 89 

30.5450487 

1071811 

934 

87 23 56 

30.5614136 

1070664 

935 

87 42 25 

30.5777697 

1069519 

936 

87 60 96 

30.5941171 

1068376 

937 

87 79 69 

30.6104557 

1067236 

938 

87 98 44 

30.6267857 

1066098 

939 

88 17 21 

30.6431069 

1064963 

940 

88 36 00 

30.6594194 

1063830 

941 

88 54 81 

30.6757233 

1062699 

942 

88 73 64 

30.6920185 

1061571 

943 

88 92 49 

30.7083051 

1060445 

944 

89 11 36 

30.7245830 

1059322 

945 

89 30 25 

30.7408523 

1058201 

946 

89 49 16 

30.7571130 

1057082 

947 

89 68 09 

30.7733651 

1055966 

948 

89 87 04 

30.7896086 

1054852 

949 

90 06 01 

30.8058436 

1053741 

950 

90 25 00 

30.8220700 

1052632 


No. 

Square 

Square Root 

Reciprocal 

951 

90 44 01 

30.8382879 

1051525 

952 

90 63 04 

30.8544972 

1050420 

953 

90 82 09 

30.8706981 

1049318 

954 

91 01 16 

30.8868904 

1048218 

955 

91 20 25 

30.9030743 

1047120 

956 

91 39 36 

30.9192497 

1046025 

957 

91 58 49 

30.9354166 

1044932 

958 

91 77 64 

30.9515751 

1043841 

959 

91 96 81 

30.9677251 

1042753 

960 

92 16 00 

30.9838668 

1041667 

961 

92 35 21 

31.0000000 

1040583 

962 

92 54 44 

31.0161248 

1039501 

963 

92 73 69 

31.0322413 

1038422 

964 

92 92 96 

31.0483494 

1037344 

965 

93 12 25 

31.0644491 

1036269 

966 

93 31 5fr 

31.0805405 

1035197 

967 

93 50 89 

31.0966236 

1034126 

968 

93 70 24 

31.1126984 

1033058 

969 

93 89 61 

31.1287648 

1031992 

970 

94 09 00 

31.1448230 

1030928 

971 

94 28 41 

31.1608729 

1029866 

972 

94 47 84 

31.1769145 

1028807 

973 

94 67 29 

31.1929479 

1027749 

974 

94 86 76 

31.2089731 

1026694 

975 

95 06 25 

31.2249900 

1025641 

976 

95 25 76 

31 . 2409987 

1024590 

977 

95 45 29 

31.2569992 

1023541 

9~8 

95 64 84 

31.2729915 

1022495 

979 

95 84 41 

31.2889757 

1021450 

980 

96 04 00 

31.3049517 

1020408 

981 

96 23 61 

31.3209195 

1019368 

982 

96 43 24 

31.3368792 

1018330 

983 

96 62 89 

31.3528308 

1017294 

984 

96 82 56 

31.3687743 

1016260 

985 

97 02 25 

31.3847097 

1015228 

986 

97 21 96 

31.4006369 

1014199 

987 

97 41 69 

31.4165561 

1013171 

988 

97 61 44 

31.4324673 

1012146 

989 

97 81 21 

31.4483704 

1011122 

990 

98 01 00 

31.4642654 

1010101 

991 

98 20 81 

31.4801525 

1009082 

992 

98 40 64 

31.4960315 

1008065 

993 

98 60 49 

31.5119025 

1007049 

994 

98 80 36 

31.5277655 

1006036 

995 

99 00 25 

31.5436206 

1005025 

996 

99 20 16 

31.5594677 

1004016 

997 

99 40 09 

31.5753068 

1003009 

998 

99 60 04 

31.5911380 

1002004 

999 

99 80 01 

31.6069613 

1001001 

1000 

100 00 00 

31.6227766 

1000000 












APPENDIX C 


TABLE CXXV 

Common Logarithms and Proportional Parts 1 


Numbers 100-150 Logs 00000-17869 



~ 

1 2 

1 3 1 4 11 5 

1 6 1 7 

8 1 ~ 

p 5 ! 

100 

ooooc 

00 043 

00 08 / 

00 13 C 

)00 17: 

5 00 21 / 

00 26 C 

>00 305 

00 34 C 

>00 385 


43 

4 J 

8.6 

12.5 
17.2 

21.5 
25.8 
30.1 
34.4 
38.7 

42 

4.1 

8 A 
12.6 
16.8 
21 .C 

25.2 

29.4 

33.6 

37.8 

39 

3.9 

7.8 

11.7 

15.6 

19.5 
23.4 

27.3 
31.2 

35.1 

36 

3.6 
7.2 

10.8 

14.4 
18.0 
21 6 

25.2 

28.8 

32.4 

41 

4.1 

8.2 

12.3 

16.4 

20.5 

24.6 

28.7 

32.8 

36.9 

38 

3.8 

7.6 

11.4 

15.2 

19.0 
22.8 
26.6 

30.4 

34.2 

35 

3.5 

7.0 

10.5 

14.0 

17.5 

21.0 

24.5 
28.0 

31.5 

101 

102 

103 

104 

105 

106 

107 

108 
109 

432 

86 C 

01 284 
703 

02 119 
531 
938 

03 342 
743 

475 

903 

01 326 
745 

02 160 
572 
979 

03 383 
782 

515 

94 = 

01 365 
787 

02 202 
612 

03 019 
423 
822 

561 

935 

01 41 C 
828 

02 243 
653 

03 060 
463 
862 

6 04 

01 03 C 
452 
87 C 

02 284 
694 

03 10 C 
503 
902 

64 / 

01 072 
49 
912 

02 325 
735 

03 141 
543 
941 

685 

01 115 
53 C 
953 

02 36 ( 
77 <j 

03 181 
533 
981 

732 

01 157 
578 
995 

02 407 
816 

03 222 
623 

04 021 

775 

01 195 
62 C 

02 036 
449 
857 

03 262 
663 

04 060 

817 

01 242 
662 

02 078 
490 
898 

03 302 
703 

04_100 

04 493 

1 

2 

3 

4 

5 

6 

7 

8 
9 

110 

04 139 

04 179 

04 218 

04 258 

04 297 

04 33 f 

04 37 r 

04 415 

04 454 

40 

4.0 

8.0 
12.0 
16.0 
20.0 

24.0 

28.0 

32.0 

36.0 

111 

112 

113 

114 

115 

116 

117 

118 
119 

532 

922 

05 308 
690 

06 070 
446 
819 

07 188 
555 

07 918 

571 

961 

05 346 
729 

06 108 
483 
856 

07 225 
591 

610 

999 

05 385 
767 

06 145 
521 
893 

07 262 
628 

650 

05 038 
423 
805 

06 183 
558 
930 

07 298 
664 

689 

05 077 
461 
843 

06 221 
595 
967 

07 335 
700 

727 

05 115 
500 
881 

06 258 
633 

07 004 
372 
737 

766 

05 154 
538 
918 

06 296 
670 

07 041 
408 
773 

805 

05 192 
576 
956 

06 333 
707 

07 078 
445 
809 

844 

05 231 
614 
994 

06 371 
744 

07 115 
432 
846 

883 

05 269 
652 

06 032 
408 
781 

07 151 
518 
882 

T 

2 

3 

4 

5 

6 

7 

8 

9 

120 

07 954 

07 990 

08 027 

08 063 

08 099 

08 135 

08 171 

08 207 

03 243 

T 

2 

3 

4 

5 

6 

7 

8 

9 

f 

2 

3 

4 

5 

6 

7 

8 

9 

T 

2 

3 

4 

5 

6 

7 

8 

9 

37 

3.7 

7.4 

11.1 

14.8 

18.5 
22.2 

25.9 

29.6 
33.3 

121 

122 

123 

124 

125 

126 

127 

128 
129 

08 279 
636 
991 

09 342 
691 

10 037 
380 
721 

11 059 

08 314 
672 

09 026 
377 
726 

10 072 
415 
755 

11093 

08 350 
707 

09 061 
412 
760 

10 106 
449 
789 

11 126 

386 

743 

09 096 
447 
795 

10 140 
4 S 3 
823 

11 160 

422 

778 

09 132 
482 
830 

10 175 
517 
857 

11 193 

458 

814 

09 167 
517 
864 

10 209 
551 
890 

11 227 

493 

849 

09 202 
552 
899 

10 243 
585 
924 

11 261 

529 

884 

09 237 
587 
934 

10 278 
619 
958 

11 294 

565 

920 

09 272 
621 
968 

10 312 
653 
992 

11 327 

600 

955 

09 307 
656 

10 003 
346 
687 

11 025 
361 

130 

11 394 

11428 

11461 

11494 

11 528 

11 561 

11594 

11 628 

11 661 

1 1 694 

34 

3.4 

6.8 

10.2 

13.6 
17.0 

20.4 
23.8 
27.2 

30.6 

31 

3.1 

6.2 
9.3 

12.4 

15.5 

18.6 

21.7 

24.8 

27.9 

33 

3.3 
6.6 
9.9 
13.2 
16 5 
19.8 
23.1 
26.4 
29.7 

30 

3.0 

6.0 
9.0 

12.0 

15.0 

18.0 
21.0 

24.0 

27.0 

32 

3.2 

6.4 

9.6 
12.8 
16.0 

19.2 

22.4 

25.6 
28.8 

29 

2.9 

5.8 

8.7 

11.6 

14.5 
17.4 

20.3 
23.2 
25.1 

131 

132 

133 

134 

135 

136 

137 

138 

139 

727 

12 057 
385 
710 

13 033 
354 
672 
988 

14301 

760 

12 090 
418 
743 

13 066 
386 
704 

14 019 
333 

793 

12 123 
450 
775 

13 098 
418 
735 

14 051 
364 

826 

12 156 
483 
808 

13 130 
450 
767 

14 082 
395 

860 

12 189 
516 
840 

13 162 
481 
799 

14 114 
426 

893 
12 222 
548 
872 

13 194 
513 
830 

14 145 
457 

926 

12 254 
581 
905 

13 226 
545 
862 

14 176 
489 

959 

12 287 
613 
937 

13 253 
577 
893 

14 203 
520 

992 

12 320 
646 
969 

13 290 
609 
925 

14 239 
551 

12 024 
352 
678 

13 001 
322 
640 
956 

14 270 
582 

140 

14 613 

14 644 

14 675 

14 706 

14 737 

14 768 

14 799 

14 829 

14 860 

14 891 

141 

142 

143 

144 

145 

146 

147 

148 

149 

922 
15 229 
534 
836 
16137 
435 
732 
17 026 
319 
17 609 

953 

15 259 
564 
866 

16 167 
465 
761 

17 056 
348 

983 

15 290 
594 
897 

16 197 
495 
791 

17 085 
377 

15 014 
320 
625 
927 

16 227 
524 
820 

17 114 
406 

15 045 
351 
655 
957 

16 256 
554 
850 

17 143 
435 

15 076 
381 
685 
987 

16 286 
584 
879 

17 173 
464 

15 106 
412 
715 

16 017 
316 
613 
909 

17 202 
493 

15 137 
442 
746 

16 047 
346 
643 
938 

17 231 
522 

15 168 
473 
776 

16 077 
376 
673 
967 

17 260 
551 

15 198 
503 
806 
16107 
406 
702 
997 
17 289 
__580 
17 869 

150 

17 638 

17 667 

17 696 

17 725 

17 754 

17 782 

17 811 

17 840 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


1 Reprinted from The Mathematics of Finance , Houghton Mifflin Co., Boston, by per- 
mission of the publisher. 

446 







COMMON LOGARITHMS AND PROPORTIONAL PARTS 


447 


Numbers 150-200 Logs 17609-30298 1 


N 

1 o 

1 

2 

3 

4 

i 5 | 6 | 7 

8 | 9 

1 P . P . II 

150 

17 609 

17 638 

17 667 

17 696 

17 725 

17 754 

18 041 
327 
611 
893 

19 173 
451 
728 

20003 

276 

17 782 

17 811 

17840 

17 869 


29 

28 

151 

152 

153 

154 

155 

156 

157 

158 

159 

898 
18184 
469 
752 
19 033 
312 
590 
866 
20140 

926 

18 213 
498 
780 

19 061 
340 
618 
893 

20167 

955 

18 241 
526 
808 

19 089 
368 
645 
921 

20 194 

984 
18 270 
554 
837 
19117 
396 
673 
948 
20 222 

18 013 
298 
583 
865 

19 145 
424 
700 
976 

20 249 

18 070 
355 
639 
921 

19 201 
479 
756 

20 030 
303 

18 099 
384 
667 
949 

19 229 
507 
783 

20 058 
330 

18 127 
412 
696 
977 

19 257 
535 
811 

20 085 
358 

18 156 
441 
724 

19 005 
285 
562 
838 

20112 

385 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.9 

5.8 

8.7 

11.6 

14.5 

17.4 

20.3 

23.2 

26.1 

2.8 

5.6 

8.4 

11.2 

14.0 

16.8 

19.6 

22.4 

25.2 

160 

20 412 

20439 

20 466 

20 493 

20 520 

20 548 

20 575 

20 602 

20 629 

20 656 


27 

26 

161 

162 

163 

164 

165 

166 

167 

168 
169 

683 
952 
21219 
484 
748 
22 011 
272 
531 
789 

710 

978 

21 245 
511 
775 

22 037 
298 
557 
814 

737 

21 005 
272 
537 
801 

22 063 
324 
583 
840 

763 

21 032 
299 
564 
827 

22 089 
350 
608 
866 

790 

21059 

325 

590 

854 

22115 

376 

634 

891 

817 
21085 
352 
617 
8 S 0 
22 141 
401 
660 
917 

844 
21 112 
378 
643 
906 
22 167 
427 
686 
943 

871 

21 139 
405 
669 
932 

22 194 
453 
712 
968 

898 

21 165 
431 
696 
958 

22 220 
479 
737 
994 

- 

925 

21 192 
458 
722 
985 

22 246 
505 
763 

23 019 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.7 

5.4 

8.1 

10.8 

13.5 
16.2 
18.9 

21.6 
24.3 

2.6 

5.2 

7.8 

10.4 
13.0 

15 6 
18.2 
20.8 

23.4 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

23 045 

23 070 

23 096 

23 121 

23 147 

23 172 

23 198 

23 223 

23 249 

23 274 


25 

24 

300 
553 
805 
24055 
304 
551 
797 
25 042 
285 

325 

578 

830 

24 080 
329 
576 
822 

25 066 
310 

350 

603 

855 

24 105 
353 
601 
846 

25 091 
334 

376 

629 

880 

24 130 
378 

j 625 
871 

25 115 
358 

401 

654 

905 

24 155 
403 

, 650 

895 

25 139 
382 

426 
679 
930 
24180 
428 
674 
920 
25 164 
_406 

452 

704 

955 

24 204 
452 
699 
944 

25 188 
431 

477 

729 

980 

24 229 
477 
724 
969 

25 212 
455 

502 
754 
24005 
254 
502 
748 
993 
25 237 
479 


52$ 

779 

24 030 
279 
527 
77 3 

25 018 
261 
503 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.5 
5.0 

7.5 
10.0 

12.5 

15.0 

17.5 

20.0 

22.5 

2.4 

4.8 

7.2 

9.6 

12.0 

14.4 

16.8 

19.2 

21.6 

180 

25 527 

25 551 

25 575 

25 600 

25 624 

25 648 

25 672 

25696 

25 720 

25 744 


23 

22 

181 

182 

183 

184 

185 

186 

187 

188 
189 

768 

26 007 
245 
482 
717 
951 

27 184 
416 
646 

792 

26 031 
269 
505 
741 
975 

27 207 
439 
669 

816 

26 055 
293 
529 
764 
998 

27 231 
462 

692 

840 

26 079 
316 
553 
788 

27 021 
254 
485 
715 

864 
26 102 
340 
576 
811 
27 045 
277 
508 
738 

888 
26126 
364 
600 
834 
27 068 
300 
531 
761 

912 
26150 
387 
623 
858 
27 091 
323 
554 
784 

935 

26174 

411 

647 

881 

27114 

346 

577 

807 

959 

26 198 
435 
670 
905 

27 138 
370 
600 
830 

983 
26 221 
458 
694 
928 
27 161 
393 
623 
852 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.3 

4.6 

6.9 

9.2 

11.5 

13.8 

16.1 

18.4 

20.7 

2.2 

4.4 

6.6 

8.8 

11.0 

13.2 

15.4 

17.6 

19.8 

190 

27 875 

27 898 

27 921 

27_944 

27 967 

27 989 

28 012 

28 035 

28 058 

28 081 


21 


191 

192 

193 

194 

195 

196 

197 
193 
199 

28 103 
330 
556 
780 

29 003 
226 
447 
667 
885 

28126 
353 
578 
803 
29 026 
248 
469 
688 
907 

28 149 
375 
601 
825 

29 048 
270 
491 
710 
929 

28171 
398 
623 
847 
29 070 
292 
513 
732! 
951 

28 194 
421 
646 
870 

29 092 
314 
535 
754 
973 

28 217' 
443 
668 
892 

29 115 
336 
557 
776 
994 

240 

466 

691 

914 

29 137 
358 
579 
798 

30 016 

262 

488 

713 

937 

29 159 
380 
601 
820 

30 038 

285 
511 
735 
959 
29 181 
403 
623 
842 
30060 

307 

533 

758 

981 

29 203 
425 
645 
863 

30 081 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.1 

4.2 

6.3 

8.4 

10.5 

12.6 

14.7 

16.8 
18.9 

200 

30 103 

30125 

30 146 

30 168 

30 190 

30 211 

30 233 

30 255 

30 276 

30 298 




N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 




448 


SOCIAL STATISTICS 



209 32 015 32 035 32 056 32 077 32 098 32 118 32 139 32 160 32 181 32 201 


EH 

in 

201 

320 

202 

535 

203 

750 

204 

963 

205 

31 175 

206 

387 

207 

597 

208 

806 

209 

32 015 

210 

32 222 

211 

428 

212 

634 

213 

838 

214 

33 041 

215 

244 

216 

445 

217 

646 

218 

846 

219 

34 044 

220 

34 242 

221 

439 

222 

635 

223 

830 

224 

35 025 , 

225 

218 

226 

411 

227 

603 

228 

793 

229 

984; 

230 

36 173, 

231 

361 

232 

549 

233 

736 

234 

922 

235 

37 107 ; 

236 

291 

237 

475 

238 

658 

239 

840 

240 

38 021; 

241 

202 

242 

382 

243 

561 

244 

739 

245 

917 

246 

39 094* 

247 

270 

248 

445 

249 

620 

250 

39 794 ; 

N 

0 


281 302 323 345 36i 


911 931 952 973 99' 


182 199 217 235 252 


707 724 742 7591 7771 


8 

22 



l 1 

2.2 

5 2 

4.4 

2 3 

6.6 

1 4 

8.8 

5 5 

11.0 

5 6 

13.2 

7 

15.4 

4 8 

17.6 

9 

19.8 

l 

20 

i 

2.0 

5 2 

4.0 

3 

6.0 

1 4 

8.0 

5 5 

10.0 

3 6 

12.0 

7 

14.0 

8 

16.0 


18.0 

) 

19 

1 

1.9 

2 

3.8 

3 

5.7 

4 

7.6 

5 

9.5 

; 6 

11.4 

[ 7 

13.3 

; 8 

15.2 

t 9 

17.1 


18 

f 1 

1.8 

2 

3.6 

1 3 

5.4 

\ 4 

7.2 

» 5 

9.0 

r 6 

10.8 

I 7 

12.6 

» 8 

14.4 

1 9 

16.2 


17 

[ 1 

1.7 

1 2 

3.4 

3 

51 

> 4 

6.8 

5 

8.5 

6 

10.2 

7 

11.9 

8 

13.6 

9 

15.3 







COMMON LOGARITHMS AND PROPORTIONAL PARTS 


449 


















450 


SOCIAL STATISTICS 


Numbers 300-350 Logs 47712-54518 


0 

1 1 

| 2 1 3 I 

4 

II 5 1 

6 | 7 

1 8 | 


300 47 712 47 727 47 741 47 756 47 770 47 7S4 47 799 47 813 47 828 47 842 15 

301 857 871 885 900 914 929 943 958 972 986 “j rr 

302 48001 48015 48029 48044 48058 48073 48087 48101 48116 48130 2 ,'n 

303 144 159 173 187 202 216 230 244 259 273 3 45 

304 287 302 316 330 344 359 373 387 401 416 4 63) 

305 430 444 458 473 487 501 515 530 544 558 5 7.5 

306 572 586 601 615 629 643 657 671 686 700 6 9.0 

307 714 728 742 756 770 785 799 813 827 841 7 10.5 

308 855 869 883 897 911 926 940 954 968 982 8 12 0 

309 996 49010 49024 4 9038 49052 49066 49 080 49 094 49J08 49 122 9 13.5 

310 49 136 49150 49164 49178 49192 49 206 49 220 49234 49 248 49 262 ^ 

311 276 290 304 318 332 346 360 374 388 402 — rr 

312 415 429 443 457 471 485 499 513 527 541 2 ,8 

313 554 568 582 596 610 624 638 651 665 679 3 4 2 

314 693 707 721 734 748 762 776 790 803 817 4 5.6 

315 831 845 859 872 886 900 914 927 941 955 5 7.0 

316 969 982 996 50010 50024 50037 50051 50065 50079 50092 6 8.4 

317 50106 50120 50133 147 161 174 188 202 215 229 7 9.8 

318 243 256 270 2S4 297 311 325 338 352 365 8 11 -2 

319 379 393 406 420 433 447 46 1 474 488 501 9 12.6 

320 50515. 50 52 9 50542 50556 50569 50583 50596 50610 50623 50637 

321 651 664 678 691 705 718" 732 745" 759 772 

322 786 799 813 826 840 853 866 880 893 907 

323 920 934 947 961 974 987 51001 51014 51028 51041 

324 51055 51068 51081 51095 51 108 51 121 135 148 162 175 

325 188 202 215 228 242 255 263 282 295 308 

326 322 335 348 362 375 388 402 415 428 441 

327 455 468 481 495 508 521 534 548 561 574 

328 587 601 614 627 640 654 667 680 693 706 

329 720 733 746 759 772 786 799 812 825 838 

330 51 851 51 865 5 1 878 51 89 1" 51 904 51917 51 930 51943 51.957 51 970 j ~ 

331 983 996 52 009 52 022 52 035 52 048 52 061 52 075" 52 088" 52 101 

332 52 114 52127 140 153 166 179 192 205 218 '231 , il 

333 244 257 270 284 297 310 323 336 349 362 3 3° 

334 375 388 401 414 427 440 453 466 479 492 4 52 

335 504 517 530 543 556 569 582 595 608 621 5 6 5 

336 634 647 660 673 686 699 711 724 737 750 6 7.8 

337 763 776 789 802 815 827 840 853 866 879 7 9.1 

338 892 905 917 930 943 956 969 982 994 53 007 8 10.4 

339 53020 53033 53046 5 3058 53071 5308 4 53097 53110 53.122. 13 5 9 11.7 

340 53 148 53 161 53 173 53 186 53.199 53 212 53224 53 237 53 250 53 263 ^ 

341 275 288 301 314 326 339 352 364 377 390 “j TT 

342 403 415 428 441 453 466 479 491 504 517 , ,1 

343 529 542 555 567 580 593 605 618 631 643 3 i t 

344 656 668 681 694 706 719 732 744 757 769 4 i.S 

345 782 794 807 820 832 845 857 870 882 895 5 6 0 

346 908 920 933 945 958 970 983 995 54008 54020 6 7.2 

347 54033 54045 54058 54070 54083 54095 54108 54120 133 145 7 8.4 

348 158 170 183 195 208 220 2 33 245 258 270 8 9.6 

1 349 283 295 _307_ _320 332 34 5 357 370 382 _394 *> 10-8 

350 54407 54419 54432 54444 54456 54469~ 54481 54494 54506 54518 ~ 

1| 0 1 2 3 4 5 6 7 8 9 








COMMON LOGARITHMS AND PROPORTIONAL PARTS 451 


Numbers 350-400 Logs 54407-60304 j 


N 

m 

m 

2 

3 

4 

5 

6 

7 

8 


P . P . 

350 

54407 

54419 

54432 

54444 

54456 

54469 

54481 

54494 

54 506 

54 518 

13 

351 

352 

353 

354 

355 

356 

357 

358 

359 

531 

654 

777 

900 

55023 

145 

267 

388 

509 

543 
667 
790 
913 
55 035 
157 
279 
400 
522 

555 
679 
802 
925 
55 047 
169 
291 
413 
534 

568 
691 
814 
937 
55 060 
182 
303 
425 
546 

580 

704 

827 

949 

55072 

194 

315 

437 

558 

593 
716 
839 
962 
55 084 
206 
328 
449 
570 

605 

728 

851 

974 

55096 

218 

340 

461 

582 

617 
741 
864 
986 
55 108 
230 
352 
473 
594 

630 
753 
876 
998 
55 121 
242 
364 
485 
606 

642 

765 

888 

55011 

133 

255 

376 

497 

618 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1.3 

2.6 

3.9 

5.2 

6.5 

7.8 

9.1 

10.4 

11.7 

360 

55 630 

55 642 

55 654 

55 666 

55 678 

55 691 

55 703 

55 715 

55 727 

55 759 


12 

361 

751 

76.1 

775 

787 

799 

811 

823 

835 

847 

859 


1 2 

362 

871 

883 

895 

907 

919 

931 

943 

955 

967 

979 

2 

2 4 

363 

991 

56003 

56015 

56027 

56038 

56050 

56062 

56074 

56086 

56098 

3 

3.6 

364 

56110 

122 

134 

146 

158 

170 

182 

194 

205 

217 

4 

4.8 

365 

229 

241 

253 

265 

277 

289 

301 

312 

324 

336 

5 

6.0 

366 

348 

360 

372 

384 

396 

407 

419 

451 

443 

455 

6 

7.2 

367 

467 

478 

490 

502 

514 

526 

538 

549 

561 

573 

7 

8.4 

368 

585 

597 

608 

620 

632 

644 

656 

667 

679 

691 

8 

9.6 

369 

703 

714 

726 

738 

750 

761 

773 

785 

797 

808 

9 

10.8 

370 

56 820 

56 832 

56 844 

56 855 

56867 

56 879 

56891 

56 902 

56 914 

56926 



371 

937 

949 

961 

972 

984 

996 

57008 

57 019 

57 051 

57 043 



572 

57 054 

57 066 

57 078 

57 089 

57 101 

57113 

124 

136 

148 

159 



373 

171 

183 

194 

206 

217 

229 

241 

252 

264 

276 



374 

287 

299 

310 

322 

334 

345 

357 

368 

380 

392 



375 

403 

415 

426 

438 

449 

461 

473 

484 

496 

507 



376 

519 

j 530 

542 

553 

565 

576 

588 

600 

611 

623 



377 

634 

1 646 

657 

669 

680 

692 

703 

715 

726 

738 



378 

749 

761 

772 

784 

795 

807 

818 

830 

841 

852 



379 

864 

875 

1.887 

_ _898 

910 

_92i 

933 

944 

l_9?5 

_907 



380 

57_978 

57 99(1 

58001 

58013 

58024 

58 035 

58 047 

58 058 

58 070 

58 081 


12 

381 

58092 

58 104 

115 

127 

138 

149 

161 

172 

184 

195’ 

T 

1.1 

382 

206 

218 

229 

240 

252 

263 

274 

I 286 

297 

309 

2 

2.2 

383 

320 

331 

343 

354 

365 

377 

388 

399 

410 

422 

3 

3.3 

384 

433 

444 

456 

467 

478 

490 

501 

512 

524 

535 

4 

4.4 

385 

546 

557 

569 

580 

591 

602 

614 

625 

636 

647 

5 

5.5 

386 

659 

670 

681 

692 

704 

715 

726 

737 

749 

760 

6 

6.6 

387 

771 

782 

794 

805 

816 

827 

838 

850 

861 

872 

7 

7.7 

388 

883 

894 

| 906 

917 

928 

939 

950 

961 

973 

984 

8 

8.8 

389 

995 

59 006 

59J)17_ 

59028 

59 040 

59051 

59062 

59073 

59084 

59095 

9 

9.9 

390 

59 106 

59118 

59129 

59140 

59 151 

59 162 

59173 

59 184’ 

59 195 

59 207 


10 

391 

218 

229 

240 

251 

262 

273 

284 

295 

306 

318 

T 

1.0 

392 

329 

340 

351 

362 

373 

384 

395 | 

406 

417 

428 

2 

2.0 

393 

439 

450 

461 

472 

483 

494 

506 

517 

528 

539 

3 

3.0 

394 

550 

561 

572 

583 

594 

605 

616 

627 

638 

649 

4 

4.0 

395 

660 

671 

682 

693 

704 

715 

726 

737 

748 

759 

5 

5.0 

396 

770 

780 

791 

802 

813 

824 

835 

846 

857 

868 

6 

6.0 

397 

879 

890 

901 

912 

923 

934 

945 

956 

.966 

977 

7 

7.0 

398 

988 

999 

60010 

60021 

60032 

60043 

60054 

60065 

60076 

60 086 

8 

8.0 

399 

60 097 

60108 

119_ 

130 

141 

1_52 

163 

173 

184 

195 

9 

9.0 

~400 

60206 

60 217 

60 228 

60 239 

60 249 

60 260 

60 271 

60 282 

60293 

60304 



N | 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

— 




452 


SOCIAL STATISTICS 


Humbers 400-450 Logs 60206-65408 

N | 0 | 1 | 2 1 3 | 4 |[ 5 | 6 | 7 | 8 | 9 j P.P. 

400 [60 206 60217 60228 60239 60249 60260 60271 60282 60293 60304 n 

401 314 325 336 347 358 369 379 390 401 412 h — fT" 

402 423 433 444 455 466 477 487 498 509 520 2 

403 531 541 552 563 574 584 595 606 617 627 3 33 

404 638 649 660 670 681 692 703 713 724 735 4 4.4 

405 746 756 767 778 788 799 810 821 831 842 5 5.5 

406 853 863 874 885 895 906 917 927 938 949 6 6.6 

407 959 970 981 991 61002 61013 61023 61034 61045 61055 7 7.7 

408 61066 61077 61087 61098 10? 119 130 140 151 162 8 8.8 

409 172 183 194 204 215 225 236 247 257 268 9 9.9 

410 61 278 61_289 61300 61310 61 32 1 6 1 331 61342 61 352 61 363 61 374 

411 384 395 405 416 426 " 437 448 458 469 479 

412 490 500 511 521 532 542 553 563 574 584 

413 595 606 616 627 637 648 658 669 679 690 

414 700 711 721 731 742 752 763 773 784 794 

415 805 815 826 836 847 857 868 878 888 899 

416 909 920 930 941 951 962 972 982 993 62003 

417 |62 014 62024 62034 62 045 62055 62066 62076 62086 62097 107 

418 118 128 138 149 159 170 180 190 201 211 

419 221 232 242 252 263 273 284 294 304 315 

420 62 3 2 5 62 335 62346 62 356 62 366 62 377 6238 7 62397 62408 62418 10 

421 428 439 449 459 469 480 490 500 511 521 1 — fTT 

422 531 542 552 562 572 583 593 603 613 624 2 20 

423 634 644 655 665 675 685 696 706 716 726 3 3 0 

424 737 747 757 767 778 788 798 808 818 829 4 4X) 

425 839 849 859 870 880 890 900 910 921 931 5 5.0 

426 941 951 961 972 982 992 63 002 63 012 63 022 63 033 6 6.0 

427 63 043 63 053 63 063 63 073 63 083 63 094 104 114 124 134 1 7.0 

428 144 155 165 175 185 195 205 215 225 236 « 8.0 

429 246 256 266 276 286 296 306 317 327 337 g 9.0 

430 63 347 63357 63367 633 77 63 387 63 39_7_ 63 407 63 417 63 428 63 438 

431 448 458 468 478 488 498 508 518 528 538 

432 548 558 568 579 589 599 609 619 629 639 

433 649 659 669 679 689 699 709 719 729 739 

434 749 759 769 779 789 799 809 819 829 839 

435 849 859 869 879 889 899 909 919 929 939 

436 949 959 969 97 9 988 993 64008 64018 64028 64038 

437 64048 64058 64068 64078 64088 64098 108 118 128 137 

438 147 157 167 177 187 197 207 217 227 237 

439 246 256 266 276 286 296 306 316 32 6 335 

440 64345 64355 64365 64375 64385 64395 64404 64414 64424 64434 g 

441 444 454 464 473 483 493 503 513 523 532 ~j 

442 542 552 562 572 582 591 601 611 621 631 ‘ ,, 

443 640 650 660 670 680 689 699 709 719 729 3 2 7 

444 738 748 758 768 777 787 797 807 816 826 4 3.6 

445 836 846 856 865 875 885 895 904 914 924 5 4 5 

446 933 943 953 963 972 982 992 65002 65011 65021 6 5.4 

447 65 031 65 040 65 050 65 060 65 070 65 079 65 089 099 108 118 7 6.3 

448 128 137 147 157 167 176 186 196 205 215 8 7.2 

449 225 234 244 254 263 273 283 292 302 312 9 8.1 

450 65321 65331 65341 65 350'* 65 360 65369 65379 65389 65 398 65408 

H 0 1 2 3 4 5 6 7 8 9 I 








COMMON LOGARITHMS AND PROPORTIONAL PARTS 453 


r~o“T 

TT 

2 1 

3 1 

65321 

55 331 

65341 

65 350 6 

418 

427 

437 

447 

514 

523 

533 

543 

610 

619 

629 

639 

706 

715 

725 

734 

801 

811 

820 

830 

896 

906 

916 

925 

992 

66 001 

66011 

66020 6 

66087 

096 

106 

115 

181 

191 

200 

210 

66276 

66 285 

66 295 

66 304 ( 

370 

3 SO 

339. 

393 

464 

474 

483 

492 

558 

567 

577 

536 

652 

661 

671 

6S0 

745 

755 

764 

773 

839 

843 

857 

867 

932 

941 

950 

960 

67 025 

67034 

67043 

67052 C 

117 

127 

136 

145 





67 210 

67 219 

67 228 

67 237 ( 

302 

311 

321 

330 

394 

403 

413 

422 

486 

495 

504 

514 

578 

587 

596 

605 

669 

679 

633 

697 

761 

770 

779 

788 

852 

861 

870 

879 

943 

952 

I 961 

970 

68084 

68 043 

68 052 

68 061_ ( 

"68124 

68 133_ 

68 142 

68J5J_ ( 

215 

224 

233 

242 

305 

314 

323 

382 

395 

404 

413 

422 

485 

494 

502 

511 

574 

583 

592 

601 

664 

673 

681 

690 

753 

762 

771 

780 

842 

851 

860 

869 

951 

_940 

_919_ 

95 S 

69 020 

69 023 

69 087 

69 046 

10S 

117 

126 

135 

197 

205 

214 

223 

285 

294 

302 

311 

373 

381 

390 

399 

461 

469 

478 

487 

548 

557 

566 

574 

636 

644 

653 

662 

723 

732 

740 

749 

810 

819 

827 

836 

1 69897 

69906 

69914 

o 

69 923 

3 

0 

1 

u 



Numbers 4S0-500 Logs 65321-69975 


8 


66323 66332. 66342. 6 6351 6636 ] 
417 427 436 445 455 

511 521 530 539 549 

605 614 624 633 642 

699 708 717 727 736 

792 801 811 820 829 

885 894 904 913 922 

978 987 997 67006 67015 

67071 67080 67089 099 108 

164 173 182 191 2011 


67 256 67 265 
348 357 

440 449 

532 541 


67 274 67 284 67 293 

367 376 385 “ 

459 468 477 

550 560 569 

642 651 660 

733 742 752 

825 834 843 

916 925 934 

6S006 68 015 68 024 

097 106 115 

68187 68 1% 68205 ' 
278 287 " 296 

368 377 386 

458 467 476 

547 556 565 

637 646 655 

726 735 744 

815 824 833 

904 913 922 

993. 69.002 69 01 1 

69 082 69 090 69 09 9 
170 179 188 

258 267 276 

346 355 364 

434 443 452 

522 531 539 

609 618 627 

697 705 714 

784 793 801 

_871_ 88 0 888 

’ 69958 69966 69 975 
7 8 9 








454 


SOCIAL STATISTICS 


Numbers 500-550 Logs 69897-741 07 



r» 

1 2 

1 .3 

r 

4 1 5 | 6 | 7 


1 8 

1 «> 


500 

69897 

69 906 

69 914 

69923 

69 932 69940 

69 949 

69 958 

69966 

69975 


9 

501 

984 

992 

70001 

70 010 

70018 70027 

70036 

70044 

70053 

70062 

1 

n q 

502 

70 070 

70079 

088 

096 

105 114 

122 

131 

140 

148 

2 

1 8 

503 

157 

165 

174 

183 

191 200 

209 

217 

226 

234 

3 

2.7 

504 

243 

252 

260 

269 

278 286 

295 

303 

312 

321 

4 

3.6 

505 

329 

338 

346 

355 

364 372 

381 

389 

398 

406 

5 

4.5 

506 

415 

424 

432 

441 

449 458 

467 

475 

484 

492 

6 

5.4 

507 

501 

509 

518 

526 

535 544 

552 

561 

569 

578 

7 

6.3 

508 

586 

595 

603 

612 

621 629 

638 

646 

655 

663 

8 

7.2 

509 

672 

680 

689 

697 

706 714 

723 

731 

740 

749 

9 

8.1 

510 

70 757 

70 766 

70 774 

70 783 

70 791 70 800 

70 808 

70817 

70825 

70 834 



511 

842 

851 

859 

868 

876 885 

893 

902 

910 

919 



512 

927 

935 

944 

952 

961 969 

978 

986 

995 

71003 



513 

71012 

71020 

71029 

71037 

71046 71054 

71063 

71071 

71079 

088 



514 

096 

105 

113 

122 

130 139 

147 

155 

164 

172 



515 

181 

189 

198 

206 

214 223 

231 

240 

248 

257 



516 

265 

273 

282 

290 

299 307 

315 

324 

332 

341 



517 

349 

357 

366 

374 

383 391 

399 

408 

416 

425 



518 

433 

441 

450 

458 

466 475 

483 

492 

500 

508 



519 

517 

525 

533 

542 

550 559 

567 

575 

584 

592 



520 

71600 

71609 

71617 

71 625 

71 634 71 642 

71 650 

71659 

71667 

71675 

8 

521 

684 

692 

700 

709 

717 725 

734 

742 

750 

759 

1 

0 8 

522 

767 

775 

784 

792 

800 809 

817 

825 

834 

842 

2 

1 6 

523 

850 

858 

867 

875 

883 892 

900 

908 

917 

925 

3 

2.4 

524 

933 

941 

950 

958 

966 975 

983 

991 

999 

72 008 

4 

3.2 

525 

72 016 

72024 

72 032 

72 041 

72 049 72 057 

72066 

72 074 

72 082 

090 

5 

4.0 

526 

099 

107 

115 

123 

132 140 

148 

156 

165 

173 

6 

4.8 

527 

181 

189 

198 

206 

214 222 

230 

239 

247 

255 

7 

5.6 

528 

263 

272 

280 

288 

296 304 

313 

321 

329 

337 

8 

6.4 

529 

346 

354 

362 

370 

378 387 

395 

403 

411 

419 

9 

7.2 

530 

72 428 

72 436 

72 444 

72452 

72 460 72 469 

72477 

72485 

72493 

72 501 



531 

509 

518 

526 

534 

542 550 

558 

567 

575 

583 



532 

591 

599 

607 

616 

624 632 

640 

648 

656 

665 



533 

673 

681 

689 

697 

705 713 

722 

730 

738 

746 



534 

754 

762 

770 

779 

787 795 

803 

811 

819 

827 



535 

835 

843 

852 

860 

868 876 

884 

892 

900 

908 



536 

916 

925 

933 

941 

949 957 

965 

973 

981 

989 



537 

997 

73006 

73 014 

73 022 

73 030 73 038 

73046 

73054 

73062 

73 070 



538 

73078 

086 

094 

102 

111 119 

127 

135 

143 

151 



539 

159 

167 

175 

183 

191 199 

207 

215 

223 

231 



540 

73 239 

73 247 

73 255 

73 263 

73 272 73 280 

73 288 

73 296 

73304 

73 312 

7 

541 

320 

328 

336 

344 

352 360 

368 

376 

384 

392 

1 

0 7 

542 

400 

408 

416 

424 

432 440 

448 

456 

464 

472 

9 

1 _A 

543 

480 

488 

496 

504 

512 520 

528 

536 

544 

552 

3 

2.1 

544 

560 

568 

576 

584 

592 600 

608 

616 

624 

632 

4 

2.8 

545 

640 

648 

656 

664 

672 679 

687 

695 

703 

711 

5 

3.5 

546 

719 

727 

735 

743 

751 759 

767 

775 

783 

791 

6 

4.2 

547 

799 

807 

815 

823 

830 838 

846 

854 

862 

870 

7 

4.9 

548 

878 

886 

894 

902 

910 918 

926 

933 

941 

949 

8 

5.6 

549 

957 

965 

973 

981 

989 997 

74005 

74013 

74020 

74028 

9 

6.3 

550 

74 036 

74 044 

74052 

74 060 

74068 74076 

74084 

74092 

74099 

74107 



N 

0 

1 

2 

3 

4 5 

6 

7 

8 

9 

___ 








COMMON LOGARITHMS AND PROPORTIONAL PARTS 


455 


I i I 'i I s l 


550 74036 74044 74052 74060 

551 115 123 131 139 

552 194 202 210 218 

553 273 280 288 296 

554 351 359 367 374 

555 429 437 445 453 

556 507 515 523 531 

557 586 593 601 609 

558 663 671 679 687 

559 741 749 757 764 

560 74 819 74 827 74834 74842 

561 896 904 912 920 

562 974 981 989 997 

563 75051 75059 75066 75074 

564 128 136 143 151 

565 205 213 220 228 

566 282 289 297 305 

567 358 366 374 381 

568 435 442 450 458 

569 511 519 526 534 

570 75 587 75 595 _ 75 60 3 75 610 

571 664 671 679 686 

572 740 747 755 762 

573 815 823 831 838 

574 891 899 906 914 

575 967 974 982 989 

576 76042 76050 76057 76065 

577 118 125 133 140 

578 193 200 208 215 

579 268 275 283 290 

580 76343 76350 76358 


4 FT 


Numbers 550-600 Logs 74036-77880 


581 418 

582 492 

583 567 

584 641 

585 716 

586 790 

587 864 

588 938 

589 77012 

590 77 085 

591 159 

592 232 

593 305 

594 379 

595 452 

596 525 

597 597 

598 670 

599 743 

600 77 815 


425 433 440 

500 507 515 

574 582 589 

649 656 664 

723 730 738 

797 805 812 

871 879 886 

945 953 960 

7 7019 77026 7 7034 

77 093 77 100 77 107 

166 173 181 

240 247 254 

313 320 327 

386 393 401 

459 466 474 

532 539 546 

605 612 619 

677 685 692 

750 757 764 

77 822 77 830177 837 


74076 

74084 

74 092 

74099 

74 107 

155 

162 

170 

178 

186 

233 

241 

249 

257 

265 

312 

320 

327 

335 

343 

390 

398 

406 

414 

421 

468 

476 

484 

492 

500 

547 

554 

562 

570 

578 

624 

632 

640 

648 

656 

702 

710 

718 

726 

733 

780 

788 

796 

803 

811 

74858 

74865 

74 873 

74 881 

74 889 

935 

943 

950 

958 

966 

75012 

75 020 

75028 

75 035 

75043 

089 

097 

105 

113 

120 

166 

174 

182 

189 

197 

243 

251 

259 

266 

274 

320 

328 

335 

343 

351 

397 

404 

412 

420 

427 

473 

481 

488 

496 

504 

549 

557 

565 

572 

580 

75626 

75 633 

75 641 

75648 

75 656 

702 

709 

717 

724 

732 

778 

785 

793 

800 

808 

853 

861 

868 

876 

884 

929 

937 

944 

952 

959 

76005 

76012 

76020 

76027 

76035 

080 

087 

095 

103 

110 

155 

163 

170 

178 

185 

230 

238 

245 

253 

260 

__ 305 _ 

313 

320 

328 

335 . 

76380 

76388 

76395 

76403 

76410 ‘ 

455 

462 

470 

1 477 

485 

530 

537 

545 

552 

559 

604 

612 

619 

626 

634 

678 

686 

693 

701 

708 

753 

760 

768 

775 

782 

827 

834 

842 

849 

856 

901 

908 

916 

923 

930 

975 

982 

989 

997 

77 004 

77048 

77056 

77063 

77070 

078 

77 122 

77 1 29 _ 

77 137 

77144 

77 151 

195 

203 

210 

217 

225 

269 

276 

283 

291 

298 

342 

349 

357 

364 

371 

415 

422 

430 

437 

444 

488 

495 

503 

510 

517 

561 

568 

576 

583 

590 

634 

641 

648 

656 

663 

706 

714 

721 

728 

735 

779 

786 

793 

801 

808 

77 851 

77 859 

77 866 

77 873 

77 880 

5 

6 

7 

8 

9 


456 


SOCIAL STATISTICS 


Numbers 600-650 Logs 77815-81351 


3 


604 104 111 

605 176 183 

606 247 254 

607 319 326 

608 390 398 

_609 _462 469 

610 78533 78 540 

611 604 611 

612 675 682 

613 746 753 

614 817 824 

615 888 895 

616 958 965 

617 79029 79 036 

618 099 106 

_619 169 176 

620 79 239 79 246 


624 518 525 

625 588 595 

626 657 664 

627 727 734 

628 796 803 

_629_ 865 87 2 

630 79 934 79941 

631 80003 80 010 

632 072 079 

633 140 147 

634 209 216 

635 277 284 

636 346 353 

637 414 421 

638 482 489 

639 550 5 57 

640 80 618 80625 


79 948 79 955 79 962 79 969 79 975 


641 686 

642 754 

643 821 

644 889 

645 956 

646 81023 

647 090 

648 158 

649 I 224_ 

650 1 81291 

N 0 


1 6 

7 

8 

9 II 

77 859 

77 866 

77 873 

77 880 

931 

938 

945 

952 

78003 

78010 

78017 

78025 

075 

082 

089 

097 

147 

154 

161 

168 

219 

226 

233 

240 

290 

297 

305 

312 

362 

369 

376 

383 

433 

440 

447 

455 

504 

512 

519 

526 

78 576 

78583 

78590 

78597 " 

647 

654 

661 

668 

718 

725 

732 

739 

789 

796 

803 

810 

859 

866 

873 

880 

930 

937 

944 

951 

79000 

79007 

79014 

79021 

071 

078 

085 

092 

141 

148 

155 

162 

211 

218 

225 

232 

79 281 

79 288 

79 295 

79 302 ' 

351 

358 

365 

372 - 

421 

428 

435 

442 

491 

498 

505 

511 

560 

567 

574 

581 

630 

637 

644 

650 

699 

706 

713 

720 

768 

775 

782 

789 

837 

844 

851 

858 

_ 906 

?1_3 

920 

927 _ 

79975 

79982 

79989 

79 996 ~ 

80044 

80051 

80058 

80065 

113 

120 

127 

134 

182 

188 

195 

202 

250 

257 

264 

271 

318 

325 

332 

339 

387 

393 

400 

407 

455 

462 

468 

475 

523 

530 

536 

543 

591 

598 

_604 

61 1 _ 

80659 

80665 

80672 

80679 r 

726 

733 

740 

’ 747 - 

794 

801 

808 

814 

862 

868 

875 

882 

929 

936 

943 

949 

996 

81003 

81010 

81017 

81 064 

070 

077 

084 

131 

137 

144 

151 

198 

204 

211 

218 

_ 265 

271 

278 

285 _ 

81 331 

81 338 

81345 

81351 ~ 

6 

7 

8 

9 








COMMON LOGARITHMS AND PROPORTIONAL PARTS 457 


Numbers 650-700 Logs 81291-84566 1 


N 

0 

1 

2 

3 

4 

1 5 

6 

7 

8 

9 

1 p.p. 

650 

81 291 

81 298 

81 305 

81311 

81 318 

81 325 

81 331 

81 338 

81 345 

81 351 



651 

358 

365 

371 

378 

385 

391 

398 

405 

411 

418 

— 


652 

425 

431 

438 

445 

451 

458 

465 

471 

478 

485 

1 

u./ 

653 

491 

498 

505 

511 

518 

525 

531 

538 

544 

551 

2 

3 

1.4 

24 

654 

558 

564 

571 

578 

584 

591 

598 

604 

611 

617 

4 

2.8 

655 

624 

631 

637 

644 

651 

657 

664 

671 

677 

684 

5 

3.5 

656 

690 

697 

704 

710 

717 

723 

730 

737 

743 

750 

6 

4.2 

657 

757 

763 

770 

776 

783 

790 

796 

803 

809 

816 

7 

4.9 

658 

823 

829 

836 

842 

849 

856 

862 

869 

875 

882 

8 

5.6 

659 

889 

895 

902 

908 

915 

921 

928 

935 

941 

948 

9 

6.3 

660 

81 954 

81961 

81968 

81 974 

81981 

81987 

81994 

82 000 

82007 

82 014 



661 

82 020 

82 027 

82 033 

82 040 

82 046 

82 053 

82 060 

066 

073 

079 



662 

086 

092 

099 

105 

112 

119 

125 

132 

138 

145 



663 

151 

158 

164 

171 

178 

184 

191 

197 

204 

210 



664 

217 

223 

230 

236 

243 

249 

256 

263 

269 

276 



665 

' 282 

2 89 

295 

302 

308 

315 

321 

328 

334 

341 



666 

347 

354 

360 

367 

373 

380 

387 

393 

400 

406 



667 

413 

419 

426 

432 

439 

445 

452 

458 

465 

471 



668 

478 

484 

491 

497 

504 

510 

517 

523 

530 

536 



669 

543 

549 

556 

562 

569 

575 

582 

588 

595 

601 



670 

82607 

82 614 

82 620 

82 627 

82 633 

82 640 

82 646 

82653 

82 659 

82 666 


6 

671 

672 

679 

685 

692 

698 

705 

711 

718 

724 

730 


O ft 

672 

737 

743 

750 

756 

763 

769 

776 

782 

789 

795 

2 

1 2 

673 

802 

808 

814 

821 

827 

834 

840 

847 

853 

860 

3 

1.8 

674 

866 

872 

879 

885 

892 

898 

905 

911 

918 

924 

4 

2.4 

675 

930 

937 

943 

950 

956 

963 

969 

975 

982 

988 

5 

3.0 

676 

995 

83 001 

83 008 

83 014 

83 020 

83027 

[83033 

83040 

83 046 

83 052 

6 

3.6 

677 

83 059 

065 

072 

078 

085 

091 

097 

104 

110 

117 

7 

4.2 

678 

123 

129 

136 

142 

149 

155 

161 

168 

174 

181 

8 

4.8 

679 

187 

193 

200 

206 

213. 

219 

225 

_ 232 

238. 

245 

9 

5.4 

680 

83 251 

83 257 

83 2 (A 

83 270 

83 276 

83 283 

83289 

83 296 

83 302 

83 308 



681 

315 

321 

327 

334 

340 

’ 347 

353 

359 

366 

372 



682 

378 

385 

I 391 

398 

404 

410 

417 

423 

429 

436 



683 

442 

448 

| 455 

461 

467 

474 

480 

487 

493 

499 



684 

506 

512 

518 

' 525 

531 

537 

544 

550 

556 

563 



685 

569 

575 

582 

588 

594 

601 

607 

613 

620 

626 



686 

632 

639 

645 

651 

658 

664 

670 

677 

683 

689 



687 

696 

702 

708 

715 

721 

727 

734 

740 

746 

753 



688 

759 

765 

771 

778 

784 

790 

797 

803 

809 

816 



689 

822 

828 

835 

841 

847 

853 

860 

866 

872 

_879 



_690. 

83 885 

83 891 

83897_ 

83 904 

83 910 

83916. 

83923 

83 929 

83 935 

83 942 



691 

948 

954 

960 

967 

973 

979 

985 

992 

998 

84 004 



692 

84011 

84017 

84023 

84029 

84036 

84042 

84048 

84055 

84 061 

067 



693 

073 

080 

086 

092 

098 

105 

111 

117 

123 

130 



694 

136 

142 

148 

155 

161 

167 

173 

180 

186 

192 



695 

198 

205 

211 

217 

223 

230 

236 

242 

248 

255 



696 

261 

267 

273 

280 

286 

292 

298 

305 

311 

317 



697 

323 

330 

336 

342 

348 

354 

361 

367 

373 

379 



698 

386 

392 

398 

404 

410 

417 

423 

429 

435 

442 



699 

448 

454 

460 

466 

473 

479 

485 

491 

497 

504 



700 

84510 

84 516 

84522 

84 528 

84 535 

84 541 

84547 

84553 

84559 

84 566 



N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

iS 





458 


SOCIAL STATISTICS 


Numbers 700-750 Logs 84510-87558 

» 0 1 \ I 3 I * II 5 | 6 1 | 8 | 9 | P.P. 

700 84510 84516 84522 84528 84 535 [84541 84 547 84553 84559 84566 1 7 

701 572 578 584 590 597 603 609 615 621 628 — Trr 

702 634 640 646 652 658 665 671 677 683 689 , 

705 696 702 708 714 720 726 733 739 745 751 3 2 \ 

704 757 763 770 776 782 788 794 800 807 813 4 2 8 

705 819 825 831 837 844 850 856 862 868 874 5 3 5 

706 880 887 893 899 905 911 917 924 930 936 6 4.2 

707 942 948 954 960 967 973 979 985 991 997 7 4.9 

708 85003 85009 85016 85022 85028 85034 85040 85046 85052 85058 8 5.6 

709 065 071 077 083 089 095 101 107 114 120 9 6-3 

710 85126 85132 85138 85 144 85150 85156 85163 85169 85 175 85181 

711 187 193 199 205 211 217 224 230 236 242 

712 248 254 260 266 272 278 285 291 297 303 

713 309 315 321 327 333 339 345 352 358 364 

714 370 376 382 388 394 400 406 412 418 425 

715 431 437 443 449 455 461 467 473 479 485 

716 491 497 503 509 516 522 528 534 540 546 

717 552 558 564 570 576 582 588 594 600 606 

718 612 618 625 631 637 643 649 655 661 667 

719 673 679 685 691 697 703 709 715 721 7 27 

720 85 733 85 739 85 745 8 5 751 85 757 85 763 85 769 85 775 85 781 85 788 6 

721 794 800 806 812 818 824 830 836 842 848 0 6~ 

722 854 860 866 872 878 884 890 896 902 908 , 

723 914 920 926 932 938 944 950 956 962 968 \ l g 

724 974 980 986 992 998 86004 86010 86016 86022 86028 4 2.4 

725 86034 86040 86046 86052 86058 064 070 076 082 088 5 3.0 

726 094 100 106 112 118 124 130 136 141 147 6 3.6 

727 153 159 165 171 177 183 189 195 201 207 7 4.2 

728 213 219 225 231 237 243 249 255 261 267 8 4.8 

729 273 279 285 291 297 303 308 31 4 320 326 9 5.4 

730 86332 86338 86344 86350 86356 86362 86368 86374 86380 86 386 

731 392 398 404 410 415 421 427 433 439 445 

732 451 457 463 469 475 481 487 493 499 504 

733 510 516 522 528 534 540 546 552 558 564 

734 570 576 581 587 593 599 605 611 617 623 

735 629 635 641 646 652 658 664 670 676 682 

736 688 694 700 705 711 717 723 729 735 741 

737 747 753 759 764 770 776 782 788 794 800 

738 806 812 817 823 829 835 841 847 853 859 

739 864 870 876 882 888 894 900 906 911 917 

740 86923 86929 86935 86941 86947 86953 8 6958 86964 86970 8 6976 5 

741 982 988 994 999 87005 87011 87 017 87023 87029 87035 , 05 

742 87040 87046 87052 87058 064 070 075 081 087 093 2 1 0 

743 099 105 111 116 122 128 134 140 146 151 3 ls 

744 157 163 169 175 181 186 192 198 204 210 4 23) 

745 216 221 227 233 239 245 251 256 262 268 5 2.5 

746 274 280 286 291 297 303 309 315 320 326 6 3.0 

747 332 338 344 349 355 361 367 373 379 384 7 3.5 

748 390 396 402 408 413 419 425 431 437 442 8 4.0 

749 448 454 460 466 471 477 483 489 495 500 9 4.5 

750 87506 87512 87518 87523 87 529 87 535 87 541 87 547 87552 87 558 

N 0 X 2 3 4 I 5 6 7 8 9 1 | 









COMMON LOGARITHMS AND PROPORTIONAL PARTS 459 


0 

1 

2 

87 506 

87 512 

87 518 

564 

570 

576 

622 

628 

633 

679 

685 

691 

737 

743 

749 

795 

800 

806 

852 

858 

864 

910 

915 

921 

967 

973 

978 

88024 

88030 

88036 

88081 

88087 

88093 

138 

144 

150 

195 

201 

207 

252 

258 

264 

309 

315 

321 

366 

372 

377 

423 

429 

434 

480 

485 

491 

536 

542 

547 

593 

598 

604 

88 649 

88 655 

88 660 

705 

711 

717 

762 

767 

773 

818 

824 

829 

874 

880 

885 

930 

936 

941 

986 

992 

997 1 

89042 

89 048 

89053 

098 

104 

109 

154 

159_ 

165 

89 209 

89 215 

89 221 J 

~ 265 

271 

276 

321 

326 

332 

376 

382 

387 

432 

437 

443 

487 

492 

498 

542 

548 

553 

597 

603 

609 

653 

658 

664 

70S 

713 

719 

89 763 

89 768 

89 774 ! 

818 

823 

829 

873 

878 

883 

927 

933 

938 

982 

988 

993 

90037 

90042 

90048 ! 

091 

097 

102 

146 

151 

157 

200 

206 

211 

255 

260 

266 

90309 

90314 

90320 1 

0 

1 

2 


Numbers 750-800 Logs 87506-90358 


I 8 | 


87 541 87 547 87 552 87 558 6 

599 604 610 616 . 

656 662 668 674 \ , , 

714 720 726 731 3 Jg 

772 777 783 789 4 24 

829 835 841 846 5 3 0 

887 892 898 904 6 3.6 

944 950 955 961 7 4.2 

88 001 88 007 88 013 88 018 8 4.8 

058 064 070 076 '> 5.4 

_ 88116. 88 1 2 1_ 8 8 127 88_1 33. 

173 178 " 184 190 

230 235 241 247 

287 292 298 304 

343 349 355 360 

400 406 412 417 

457 463 468 474 

513 519 525 530 

570 576 581 587 

627 __632_ 638 643 

’ 8868 3 88689. 88694 8 8 700 5 

739 745 750 ' 756 "j — ^7" 

795 801 807 812 , , „ 

852 857 863 868 3 ,5 

908 913 91? 925 4 2.0 

964 969 975 981 5 2.5 

89020 89025 89031 89037 6 3.0 

076 081 087 092 7 3.5 

131 137 143 148 8 4.0 

187 _ 193 198 _204 l, __ 4.5 

89243 89 248 89254 89260 ' 

298 304 310 315 

354 360 365 371 

409 415 421 426 

465 470 476 481 

520 526 531 537 

575 581 586 592 

631 636 642 647 

686 691 697 702 

741 746 752 757 

89 796 89801 89807 89812 

851 856 862 867 

905 911 916 922 

960 966 971 977 

90015 90020 90026 90031 
069 075 080 086 

124 129 135 140 

179 184 189 195 

233 238 244 249 

287 293 298 304 

90342 90347 90352 90358 
6 7 8 9 






460 


SOCIAL STATISTICS 


Numbers 800-850 Logs 90309-92988 


N 

0 

1 1 

2 

3 

4 

5 

6 

7 

8 

9 11 p.p. 

800 

90309 

90314 90320 

90325 

90331 

90 336 

90342 

90347 

90352 

90358 


6 

801 

363 

369 

374 

380 

385 

390 

396 

401 

407 

412 

1 

0 6 

802 

417 

423 

428 

434 

439 

445 

450 

455 

461 

466 

9 

1 2 

803 

472 

477 

482 

488 

493 

499 

504 

509 

515 

520 

3 

1.8 

804 

526 

531 

536 

542 

547 

553 

558 

563 

569 

574 

4 

2.4 

805 

580 

585 

590 

596 

601 

607 

612 

617 

623 

628 

5 

3.0 

806 

634 

639 

644 

650 

655 

660 

666 

671 

677 

682 

6 

3.6 

807 

687 

693 

698 

703 

709 

714 

720 

725 

730 

736 

7 

4.2 

808 

741 

747 

752 

757 

763 

768 

773 

779 

784 

789 

8 

4.8 

809 

795 

800 

806 

811 

816 

822 

827 

832 

838 

843 

9 

5.4 

810 

90849 

90854 190859 

90 865 

90 870 

90875 

90 881 

90886 

90891 

90897 



811 

902 

907 

913 

918 

924 

929 

934 

940 

945 

950 



812 

956 

961 

966 

972 

977 

9S2 

988 

993 

998 

91004 



813 

91009 

91 014 91 020 

91025 

91030 

91036 

91041 

91046 

91052 

057 



814 

062 

068 

073 

078 

084 

089 

094 

100 

105 

110 



815 

116 

121 

126 

132 

137 

142 

143 

153 

158 

164 



816 

169 

174 

180 

185 

190 

196 

201 

206 

212 

217 



817 

222 

228 

233 

238 

243 

249 

254 

259 

265 

270 



818 

275 

281 

286 

291 

297 

302 

307 

312 

318 

323 



819 

328 

334 

339 

344 

350 

355 

360 

365 

371 

376 



820 

91 381 

91 387 |91 392 

91397 

91403 

91 408 

91413 

91418 

91424 

91429 


5 

821 

434 

440 

445 

450 

455 

461 

466 

471 

477 

482 

1 

0 5 

822 

487 

492 

493 

503 

508 

514 

519 

524 

529 

535 

? 

1 0 

823 

540 

545 

551 

556 

561 

566 

572 

577 

582 

587 

3 

1.5 

824 

593 

598 

603 

609 

614 

619 

624 

630 

635 

640 

•4 

2.0 

825 

645 

651 

656 

661 

666 

672 

677 

682 

687 

693 

5 

2.5 

826 

698 

703 

709 

714 

719 

724 

730 

735 

740 

745 

6 

3.0 

827 

751 

756 

761 

766 

772 

777 

782 

787 

793 

798 

7 

3.5 

828 

803 

803 

814 

819 

824 

829 

834 

840 

845 

850 

8 

4.0 

829 

855 

861 

866 

871 

8/6 

882 

887 

892 

_ 897_ 

903 

9 

4.5 

_830 

91 90S 

91913 91918 

91924 

91 929 

91934 

91 939 

91944 

91950 

91 955 



831 

960 

965 

971 

976 

981 

936 

991 

997 

92 002 

92 007 



832 

92 012 

92 018 92 023 

92 028 

92 033 

92 033 

92 044 

92 049 

054 

059 



833 

065 

070 

075 

030 

035 

091 

096 

101 

106 

111 



834 

117 

122 

127 

132 

137 

143 

148 

153 

158 

163 



835 

169 

174 

179 

184 

189 

195 

200 

205 

210 

215 



836 

221 

226 

231 

236 

241 

247 

252 

257 

262 

267 



837 

273 

278 

283 

288 

293 

298 

304 

309 

314 

319 



838 

324 

330 

335 

340 

345 

350 

355 

361 

366 

371 



839 

376 

381 

387 

392 

397 

402 

407_ 

__412_ 

418_ 

423 



840 _ 

92428 

92 433 192 438 

92443 

92 449 

92 454’ 

92 459 

92 464 

92 469 

92 474 



841 

480 

485 

490 

495 

500 

505 

511 

516 

521 

526 



842 

531 

536 

542 

547 

552 

557 

562 

567 

572 

578 



843 

583 

588 

593 

598 

603 

609 

614 

619 

624 

629 



844 

634 

639 

645 

650 

655 

660 

665 

670 

675 

681 



845 

686 

691 

696 

701 

706 

711 

716 

722 

727 

732 



846 

737 

742 

747 

752 

758 

763 

768 

773 

778 

783 



847 

788 

793 

799 

804 

809 

814 

819 

824 

829 

834 



848 

840 

845 

850 

855 

860 

865 

870 

875 

881 

886 



849 

i 891 

896 

901 

906 

911 

916 

921 

927 

932 

937 



850 

92942 

92947 192 952 

92957* 

92 962 

92 967 

92973 

92 978 

92 983 

92988 



N 

0 

1 j 

2 

3 

4 

5 

6 

7 

8 

9 










COMMON LOGARITHMS AND PROPORTIONAL PARTS 461 


1 2 


850 92 942 92947 

851 ” 993 998 

852 93044 93049 

853 095 100 

854 146 151 

855 197 202 

856 247 252 

857 298 303 

858 349 354 

Ji59 399 404 

860 93450 93 455 

861 500 505 

862 551 556 

863 601 606 

864 651 656 

865 702 707 

866 752 757 

867 802 807 

868 852 857 

869 902 _?07_ 

870 93 952 93 057 

871 94002 94007 

872 052 057 

873 101 106 

874 151 156 

875 201 206 

876 250 255 


Numbers 850-900 Logs 92942-95468 


I 6 7 8 9 II 


085 090 


877 300 

878 349 

879 _ J99 

880 94448.1 

881 498 

882 547 

883 596 

884 645 

885 694 

886 743 

887 792 

888 841 

_889. 890 

890 9 4939 

891 988 

892 95036 

893 085 

894 134 

895 182 

896 231 

897 279 

898 328 

899_ 376 

900 95 424 

N 0 


156 161 166 

206 211 216 

255 260 265 

305 310 315 

354 359 364 

404 409. _414_ 

94 454. 94458. 94 463. 1 
503 507 512 

552 557 562 

601 606 611 

650 655 660 

699 704 709 

748 753 758 

797 802 807 

846 851 856 

895. 900 905 

94944. 94949 94 954 
993 ~ 998 95002 

95041 95046 051 

090 095 100 


143 148 

192 197 

240 245 

289 294 

337 342 

386 390 

' 95434 95439 
“2 1 


92 967 

>2 973 

22 978 9 

93 018 

?3 024 

33 029 9 

069 

075 

080 

120 

125 

131 

171 

176 

181 

222 

227 

232 

273 

278 

283 

323 

328 

334 

374 

379 

384 

425 

430 

435 

93 475 

23 480 

33 4X5 9 

526 

531 

536 

576 

581 

586 

626 

631 

636 

676 

682 

687 

727 

732 

737 

777 

782 

787 

827 

832 

837 

877 

882 

887 

927 

932 

937 

93 977 

93 982 

93987 9 

94027 

94032 

94037 9 

077 

082 

086 

126 

131 

136 

176 

181 

186 

226 

231 

236 

275 

280 

285 

i 325 

330 

335 

374 

379 

384 

L 424_ 

429. 

_ 433 

r 94 47i 

94478 

94 483 < 

522 

527 

532 

571 

576 1 

581 

> 621 

626 

630 

i 670 

675 

680 

r 719 

724 

729 

; 768 

773 

778 

5 817 

822 

827 

866 

871 

876 

)_ 915_ 

912_ 

924. 

) 94963 

94968 

94973^ 

J 95 012 

95 017 

95 022 ! 

> 061 

066 

071 

5 109 

114 

119 

5 158 

163 

168 

l 207 

211 

216 

) 255 

260 

265 

) 303 

308 

313 

7 352 

357 

361 

5 400 

405 

410 

1 95448 

195453 

95458 

5 

1 6 

7 


136 141 
186 192 
237 242 
288 293 
339 344 
389 394 
440 445 


340 345 

389 394 

438. 443. 

1488 94493. 
537 542 

586 591 

635 640 

685 689 

734 738 

783 787 


318 323 
366 371 
415 419 








462 


SOCIAL STATISTICS 













COMMON LOGARITHMS AND PROPORTIONAL PARTS 463 


i . Numbers 950-1000 Logs 97772-00039 ll 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

P.P. 

950 

97 772 

97 777 

97 782 

97 786 

97 791 

97 795 

97 800 

97 804 

97 809 

97 813 



951 

818 

823 

827 

832 

836 

841 

845 

850 

855 

859 

— 

5 

952 

864 

868 

873 

877 

882 

886 

891 

896 

900 

905 

1 

0.5 

953 

909 

914 

918 

923 

928 

932 

937 

941 

946 

950 

2 

1.0 

954 

955 

959 

964 

968 

973 

978 

982 

987 

991 

996 

4 

? o 

955 

98 000 

98005 

98009 

98014 

98019 

98 023 

98028 

98032 

98037 

98041 

5 

2 5 

956 

046 

050 

055 

059 

064 

068 

073 

078 

082 

087 

6 

3.0 

957 

091 

096 

100 

105 

109 

114 

118 

123 

127 

132 

7 

3.5 

958 

137 

141 

146 

150 

155 

159 

164 

168 

173 

177 

8 

4.0 

959 

182 

186 

191 

195 

200 

204 

209 

214 

218 

223 

9 

4.5 

960 

98 227 

98 232 

98 236 

98 241 

98 245 

98 250 

98 254 

98 259 

98 263 

98 268 

— 


961 

272 

277 

281 

286 

290 

295 

299 

304 

308 

313 



962 

318 

322 

327 

331 

336 

340 

345 

349 

354 

358 



963 

363 

367 

372 

376 

381 

385 

390 

394 

399 

403 



964 

408 

412 

417 

421 

426 

430 

435 

439 

444 

448 



965 

453 

457 

462 

466 

471 

475 

480 

484 

489 

493 



966 

498 

502 

507 

511 

516 

520 

525 

529 

534 

538 



967 

543 

547 

552 

556 

561 

565 

570 

574 

579 

583 



968 

588 

592 

597 

601 

605 

610 

614 

619 

623 

628 



969 

632 

637 

641 

646 

650 

655 

659 

664 

668 

673 



970 

98 677 

98 682 

98 686 

98 691 

98 695 

98 700 

98 704 

98 709 

98 713 

98 717 


4 

971 

722 

726 

731 

735 

740 

744 

749 

753 

758 

762 


o 4 

972 

767 

771 

776 

780 

784 

789 

793 

798 

802 

807 

2 

0 8 

973 

811 

816 

820 

825 

829 

834 

838 

843 

847 

851 

3 

1.2 

974 

856 

860 

865 

869 

874 

878 

883 

887 

892 

896 

4 

1.6 

975 

900 

905 

909 

914 

918 

923 

927 

932 

936 

941 

5 

2.0 

976 

945 

949 

954 

958 

963 

967 

972 

976 

981 

985 

6 

2.4 

977 

989 

994 

998 

99003 

99 007 

99 012 

99 016 

99 021 

99 025 

99029 

7 

2.8 

978 

99 034 

99038 

99 043 

047 

052 

056 

061 

065 

069 

074 

8 

3.2 

979 

078 

083 

087 

092 

__096 

100 

105 

109 

1J4 

118 

9 

3.6 

980 

99123 

99127 

99131 

99 136 

99140 

99 145 

99149 

99,154 

99158 

99162 



981 

167 

171 

176 

180 

185 

189 

193 

198 

202 

207" 



982 

211 

216 

220 

224 

229 

233 

238 

242 

247 

251 



983 

255 

260 

264 

269 

273 

277 

282 

286 

291 

295 



984 

300 

304 

308 

313 

317 

322 

326 

330 

335 

339 



985 

344 

348 

352 

357 

361 

366 

370 

374 

379 

383 



986 

388 

392 

! 396 

401 

405 

410 

t 414 

419 

423 

427 



987 

432 

1 436 

441 

445 

449 

454 

458 

463 

467 

471 



988 

476 

480 

484 

489 

493 

498 

502 

506 

511 

515 



989 

520 

524 

528 

533 

537 

542 

546 

550 

555 

559 



990 

99 564 

99 568 

99 572 

99 577 

99 581 

99 585_ 

99 590 

99 594 

99 599 

99 603 



991 

607 

612 

616 

621 

625 

629 

634 

638 

642 

647 



992 

651 

656 

660 

664 

669 

673 

677 

682 

686 

691 



993 

695 

699 

704 

708 

712 

717 

721 

726 

730 

734 



994 

739 

743 

747 

752 

756 

760 

765 

769 

774 

778 



995 

782 

787 

791 

795 

800 

804 

808 

813 

817 

822 



996 

826 

830 

835 

839 

843 

848 

852 

856 

861 

865 



997 

870 

874 

878 

883 

887 

891 

896 

900 

904 

909 



998 

913 

917 

922 

926 

930 

935 

939 

944 

948 

952 



999 

957 

961 

965 

970 

974 

978 

983 

987 

__991_ 

996 



1000 

00000 

00004 

00009 

00013 

00017 

00022 

00026 

00030 

00035 

00039 



N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

= 





464 SOCIAL STATISTICS 


Numbers 1000-1050 Logs 0000000-0215614 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1000 

000 0000 

0434 

0869 

1303 

1737 

2171 

2605 

3039 

3473 

3907 

01 

4341 

4775 

5208 

5642 

6076 

6510 

6943 

7377 

7810 

8244 

02 

8677 

9111 

9544 

9977 

*0411 

*0844 

*1277 

*1710 

*2143 

*2576 

03 

001 3009 

3442 

3875 

4308 

4741 

5174 

5607 

6039 

6472 

6905 

04 

7337 

7770 

8202 

8635 

9067 

9499 

9932 

*0364 

*0796 

*1228 

05 

002 1661 

2093 

2525 

2957 

3389 

3821 

4253 

4685 

5116 

5548 

06 

5980 

6411 

6843 

7275 

7706 

8138 

8569 

9001 

9432 

9863 

07 

003 0295 

0726 

1157 

1588 

2019 

2451 

2882 

3313 

3744 

4174 

08 

4605 

5036 

5467 

5898 

6328 

6759 

7190 

7620 

8051 

8481* 

09 

8912 

9342 

9772 

*0203 

*0633 

*1063 

*1493 

*1924 

*2354 

*2784 

1010 

004 3214 

3644 

4074 

4504 

4933 

5363 

5793 

6223 

6652 

7082 

11 

7512 

7941 

8371 

8800 

9229 

9659 

*0088 

*0517 

*0947 

*1376 

12 

005 1805 

2234 

2663 

3092 

3521 

3950 

4379 

4808 

5237 

5666 

13 

6094 

6523 

6952 

7380 

7809 

8238 

8666 

9094 

9523 

9951 

14 

006 0380 

0808 

1236 

1664 

2092 

2521 

2949 

3377 

3805 

4233 

15 

4660 

5088 

5516 

5944 

6372 

6799 

7227 

7655 

8082 

8510 

16 

8937 

9365 

9792 

*0219 

*0647 

*1074 

*1501 

*1928 

*2355 

*2782 

17 

007 3210 

3637 

4064 

4490 

4917 

5344 

5771 

6198 

6624 

7051 

18 

7478 

7904 

8331 

8757 

9184 

9610 

*0037 

*0463 

*0889 

*1316 

19 

008 1742 

2168 

2594 

3020 

3446 

3872 

4298 

4724 

5150 

5576 

1020 

6002 

6427 

6853 

7279 

7704 

8130 

8556 

8981 

9407 

9832 

21 

009 0257 

0683 

1108 

1533 

1959 

2384 

2809 

3234 

3659 

4084 

22 

4509 

4934 

5359 

5784 

6208 

6633 

7058 

7483 

7907 

8332 

23 

8756 

9181 

9605 

*0030 

*0454 

*0878 

*1303 

*1727 

*2151 

*2575 

24 

010 3000 

3424 

3848 

4272 

4696 

5120 

5544 

5967 

6391 

6815 

25 

7239 

7662 

8086 

8510 

8933 

9357 

9780 

*0204 

*0627 

*1050 

26 

011 1474 

1897 

2320 

2743 

3166 

3590 

4013 

4436 

4859 

5282 

27 

5704 

6127 

6550 

6973 

7396 

7818 

8241 

8664 

9086 

1 9509 

28 

9931 

*0354 

*0776 

*1198 

*1621 

*2043 

*2465 

*2887 

*3310 

*3732 

29 

012 4154 

4576 

4998_ 

5420 

.584^. 

6261 

6685_ 

7107 

_7529_ 

7951 

1030 

8372 

8794 

92 15_ 

_9637_ 

*0059 

*0480 

*0901_ 

*1323 

*1744_ 

*2165_ 

31 

013 2587 

3003 

3429 

3850 

4271 

4692 

5113 

5534 

5955 

6376 

32 

6797 

7218 

7639 

8059 

8480 

8901 

9321 

9742 

*0162 

*0583 

33 

014 1003 

1424 

1844 

2264 

2685 

3105 

3525 

3945 

4365 

4785 

34 

5205 

5625 

6045 

6465 

6885 

7305 

7725 

$144 

8564 

8984 

35 

9403 

9823 

*0243 

*0662 

*1082 

*1501 

*1920 

*2340 

*2759 

*3178 

36 

015 3598 

4017 

4436 

4855 

5274 

5693 

6112 

6531 

6950 

7369 

37 

7788 

8206 

8625 

9044 

9462 

9881 

*0300 

*0718 

*1137 

*1555 

38 

016 1974 

2392 

2810 

3229 

3647 

4065 

4483 

4901 

5319 

5737 

39 

6155 

6573 

6991 

7409 

7827 

8245 

8663 

9080 

9498 

9916 

1040 

017 0333 

0751 

1168 

1586 

2003 

2421 

2838 

3256 

3673 

4090_ 

41 

4507 

4924 

5342 

5759 

6176 

6593 

7010 

7427 

7844 

8260 

42 

8677 

9094 

9511 

9927 

*0344 

*0761 

*1177 

*1594 

*2010 

*2427 

43 

018 2843 

3259 

3676 

4092 

4508 

4925 

5341 

5757 

6173 

6589 

44 

7005 

7421 

7837 

8253 

8669 

9084 

9500 

9916 

*0332 

*0747 

45 

019 1163 

1578 

1994 

2410 

2825 

3240 

3656 

4071 

4486 

4902 

46 

5317 

5732 

6147 

6562 

6977 

7392 

7807 

8222 

8637 

9052 

47 

9467 

9882 

*0296 

*0711 

*1126 

*1540 

*1955 

*2369 

*2784 

*3198 

48 

020 3613 

4027 

4442 

4856 

5270 

5684 

6099 

6513 

6927 

7341 

49 

7755 

8169 

8533 

8997 

9411 

9824 

*0238 

*0652 

*1066 

*1479 

1050 

021 1893 

2307 

2720 

3134 

3547 

3961 

4374 

4787 

5201 

5614 

N 

0 

1 

2 

3 

4- 

5 

6 

7 

8 

9 








COMMON LOGARITHMS AND PROPORTIONAL PARTS 461 






APPENDIX D 


Selected Reference List 


Barlow's Tables of Squares , Cubes , Square Roots, Cube Roots, and 
Reciprocals of all integer numbers up to 10,000. E. and F. N. 
Spon, Ltd., London. 

Chapin, F. S., Field Work and Social Research , The Century Co., 
New York. 

Chaddock, R. E., Principles and Methods of Statistics, Houghton 
Mifflin Co., Boston. 

Dubois, Florence, A Guide to Statistics of Social Welfare in New 
York City, Welfare Council of New York City, New York. 

Dunlap, J. W., and Kurtz, A. K., Handbook of Statistical Mono- 
graphs , Tables and Formulas, World Book Co., Yonkers-on- 
Hudson, New York. 

Ezekiel, Mordicai, Methods of Correlation Analysis, John Wiley & 
Sons, New York. 

Fry, C. Luther, “Making Use of Census Data,” Jour. Amer. Stat. 
Ass’n, Columbia University, New York, June, 1930. 

Glover, J. W., Tables of Applied Mathematics in Finance, Insurance 
and Statistics , Millard Press, Ann Arbor, Mich. 

Hexter, Maurice B., Social Consequences of Business Cycles, Hough- 
ton Mifflin Co., Boston. 

Journal of the American Statistical Association , Columbia University, 
New York. 

Kelley, T. L., Statistical Method, Macmillan Co., New York. 

King, W. I., Index Numbers Elucidated, Longmans, Green & Co., 
New York. 

Macaulay, F. R., The Smoothing of Time Series, National Bureau of 
Economic Research, Inc., New York. 

McMillen, A. W., Measufement in Social Work, University of Chi- 
cago Press, Chicago. 

Mills, F. C., Statistical Methods , Henry Holt & Co., New York. 

466 



SELECTED REFERENCE LIST 467 

Mudgett, Bruce D., Statistical Tables and Graphs , Houghton Mifflin 
Co., Boston. 

Pearl, Raymond, Medical Biometry and Statistics , W. B. Saunders 
Co., Philadelphia. 

Pearson, Karl, Tables for Statisticians and Biometricians , Cambridge 
University Press, Cambridge. 

Proceedings of the American Statistical Association , Columbia Uni- 
versity, New York. 

Rice, Stuart A. (editor), Statistics in Social Studies , University of 
Pennsylvania Press, Philadelphia. 

Rietz, H. L. (editor), Handbook of Mathematical Statistics , Hough- 
ton Mifflin Co., Boston. 

Schemeckebier, L. F., The Statistical Work of the National Govern- 
ment , Johns Hopkins Press, Baltimore. 

Thomas, Dorothy S., Social Aspects of the Business Cycle , Rout- 
ledge, London. 

Thurstone, L. L., The Fundamentals of Statistics , Macmillan Co., 
New York. 

Thurstone, L. L., and Chave, E. J., The Measurement of Attitude, 
University of Chicago Press, Chicago. 

Walker, Helen D., Studies in the History of Statistical Method , Wil- 
liams and Wilkins, Baltimore. 

Weld, L. D., Theory of Errors and Least Squares, Macmillan Co., 
New York. 




INDEX 


Accuracy of observation, relativity of, 
99, 100 

Arithmetic mean, 214-222 

computed from ungrouped data, 214, 
215 

computed from grouped data, by 
long method, 215-21 7 
computed from grouped data, by 
short method, 2 17-2 19 
weighted mean, 219-222 
Array, the, 124-128 
Assembling data, 116-118 
by machine, 116, 117 
by hand, 117, 118 
Average, definition, 199-202 
Average deviation, 238-2 1 

computed from ungrouped data, 238, 
239 

computed from grouped data, long 
method, 239-241 

computed from grouped data, short 
method, 241, 242 

Averages, relations among, 225-227 

Bar chart, 176, 178-180 
Binomial expansion and chance distri- 
bution, 318-323 
Birth rates, 393, 394 

Cartograms, 18 1-186 
Case study, 61-65 
Circle chart, 180, 181 
Classification of data, 119, 120 
Collection of primary data, 101-116 
Construction of tables, 1 20-1 24 
Correlation, concept of, 277-279 
Correlation, measurement of, 296-311 
linear correlation, 297-302 
curvilinear correlation, 302-308 
correlation of grouped data, 308-311 
Correlation of time series, 3 7 7-3 81 
synchronous data, 378, 379 
lagged data, 379, 3 80 


Cube chart, 177 
Cumulative charts, 154-159 
Curve fitting, 283 

straight line, 283-288 
types of curves, 289, 290 
logarithmic curve, 291-293 
parabolic curve, 293-296 
Cyclical fluctuations, 370 

computation for annual data, 371, 
37 * 

computation for monthly data, 372- 
375 

cycles in units of o', 375-377 

Death rates, 395-399 

standard million population, 396 
corrected death rate, 396-398 
Diagrammatic chart, 186 
Dispersion, definition of, 230-232 
Dispersion, relations among measures 
of, 246-249 

Frequency distribution, 128-133 
definition, 128, 129 
class-interval, size of, 129-T32 
redistribution of classes, 132, 133 
limits of class-interval, 133 
Frequency polygon, 168-176 
Function, meaning of, 279-283 

Geometric mean, 222-225 
Graphs, definitions of, 136-139 

Histogram, 1 60-1 68 

Index numbers, 254-273 
definition of, 254-256 
applied to social data, 256, 257 
in time and geographic series, 257- 
261 

types of index numbers, 261-270 
the “best” formula, 270-273 


469 



INDEX 


470 

Logarithms, principles of, 142-145 

Machine tabulation, 84-87 
Marriage and divorce rates, 392, 393 
Median, 208-214 

median position, 208, 209 
location by formula, 210, 211 
graphic location, 21 1-214 
Mode, 202-208 

graphic location, 203, 204 
location in an array, 204, 205 
location by re-grouping, 205, 206 
location by formula, 206-208 
Morbidity, 399, 4 00 

Normal curve of error, 323-335 

testing normality by the method of 
moments, 325-327 

fitting a normal curve to data, 327- 
333 

tests of goodness of fit, 333-335 

Percentiles, 234-238 
Population growth, estimating, 385 
arithmetic method, 385-387 
geometric method, 387-389 
Whelpton’s method, 389, 390 
graphic method of breaking down 
age groups, 390-392 
Primary data, definition of, 8 1 
Primary sources, a problem requiring, 
82-89 

Probability, definition of, 317, 318 

Quantitative data, 65-80 
definition, 65, 66 

continuous and discontinuous vari- 
ables, 67 

independent and dependent variables, 
68, 69 

multiplicity of factors, 70-72 
homogeneity, 72-74 
logic and statistics, 74-79 
scientific law, 79, 80 0 

Quartile deviation, 232-234 
Questionnaires, 107-110 

Rating scales, function of, 405-409 


Rating scales, types of, 409 
scale for blindness, 409-41 1 
Chapin’s scale, 41 1-41 5 
psychoneurotic inventory, 415-419 
Thurstone-Chave attitude scale, 4 1 9- 
422 

Rectangular coordinates, 139-142 
Relative variability, coefficient of, 249, 
250 

Report forms, official, 101-107 

Sampling errors, 336-340 
Seasonal fluctuations, 359 

multiple frequency table, 362 
index based upon monthly means, 
363-365 

index by the mean-median method, 
365-368 

index by ratio-to-ordinate method, 
368-370 

Secondary data, definition of, 81 
Secondary sources, a problem requir- 
ing, 89-95 

Secular trend, 346-359 

straight line trend, 347 “ 35 °> 35 3 - 
355 

moving average, 349"35 3 
parabolic trend, 355, 356 
logarithmic trend, 357, 358 
comparison of trend values, 358, 359 
Semi-logarithmic charts, 149-154 
Skewness, 250, 251 

Social problems, interrelationships 
among, 27, 28 

Social statistics, definition, 3-5 
Standard deviation, 243-246 

computation by long method, 243, 

244- 

computation by short method, 244, 
245 

Standard rules for graphic presenta- 
tion, 187-194 

Statistical organization, 56-59 
Statistics, 5-27 
education, 5-7 
employment, 7-10 
poverty, 10-13 
old age, 13-14 

dependent and neglected children, 
14-17 



INDEX 


47i 


Statistics — ( Continued ) 
divorce, 17 , 18 
crime and delinquency, 18-21 
birth and death rates, 21, 22 
morbidity, 22-24 
insanity, 24-26 
mental deficiency, 26, 27 
published, 29-56 

value of knowledge of sources, 30-33 
federal government statistics, 33-46 
social statistics of states, 46-48 


Statist ics — (Co ntinued ) 

private organizations, 48-55 
individual agencies and institutions, 
54-56 

Straight line graph, 145-148 
Surface chart, 177 
Survey schedules, 1 10-116 

Time as a category, 343-346 

Vital statistics, scope of, 384, 385 









